(navigation image)
Home American Libraries | Canadian Libraries | Universal Library | Community Texts | Project Gutenberg | Children's Library | Biodiversity Heritage Library | Additional Collections
Search: Advanced Search
Anonymous User (login or join us)
Upload
See other formats

Full text of "Some Bayesian decision problems in a Markov chain."

Doctor of Philosophy 

SOME BATESIAN DECISION PROBLEMS 
IN A MARKOV CHAIN 

JAMES JOHN MARTIN, Jr. 
Course I June 1965 



Library 

U. S. Naval Po«tTx,luntc Stcl>Mut 

Monterey, CaUtoau* 





GENUINE PRESSBOARD BINDER 

cat. no. BF 2507 EMB 



ACCO 

CHICAGO. 
LONDON, 



OODENSBURG, N.Y. 
TORONTO. 
MEXICO, o. r. 



LaDrazy 

U. S. Naval Postgraduate School 
Monterey, California 



SOI® BaTESXAR DECISION PROMTS 
UX A MARKOV CHAXB 

JAMES «JOHH MOTN» <Jr. 
Bft, TMversl& of Ulaeonaln 
(1955) 
W 9 U. S. itaroal Postgraduate Sohool 
(1963) 



Sate&tted In partial fulf&Uaent 

of the pcaqalr«B®^t© for the degree of 

Doctor of Philosophy 

at the 
Massachusetts Institute of TechnoX©^ 
1965 



. 



H 



Library 

U. S; Naval Po»tgradH.atB School 

Monterey, California 



SOKE BATS3IAN DECISION PR0BLEH3 
IH A MAHCOV CHAIN 

fey 

JAKES JOHN MARTIN, Jr. 



Submitted to the Department of Civil Engineering on May 1U 9 1965, 
in partial fulfillment of the requirements for the degree of 
Deetor of Philosophy. 



ABSTRACT 

Some Beyesian decision models which involve a finite Markov chain 
t&th uncertain transition probabilities are studied in this report* The 
principal theoretical features of these models are set forth and various 
questions of numerical computation are considered. 

It is assumed, for the most part, that the family of prior distributions 
of the matrix of transition probabilities is closed under sampling. This 
concept i9 defined and some properties of closed families of distributions 
are obtained. It is shown that there are an arbitrarily large number of 
such families, giving considerable generality to the entire study. A 
discounted adaptive control model for a Markov chain with alternative 
transition probabilities and rewards is then formulated as & set of 
functional equations. These equations are shown to have a unique bounded 
solution and a method of successive approximations is considered which 
converges monotonically to this solution. 

The means, variances, and oovarianoes of the n-step transition 
probabilities, the stead>-state probabilities, the total discounted reward 
vector, and the process gain are then considered. It is shown that, 
under quite general conditions, the mean n»step transition probability 
matrix approaches the matrix of steady*»state probabilities as n-*<*©. 
These results are applied to discounted terminal control models in which 
a Markov chain with alternative transition probabilities and rewards 
is sampled, at a cost, until a terminal decision point is reached. At 
thai time a terminal policy is chosen and the system is operated 
indefinitely under this policy with no further sampling* It is shewn 
that a terminal decision point is reached with probability one under an 
optimal sampling strategy. These models are formulated as functional 
equations, which are shown to have a unique bounded solution, and 
successive approximation techniques are investigated. 

We then turn to fixed sample alee analysis. The Whittle distribution, 
the matrix beta distribution, and the beta-Whittle distribution are 
introduced. It is assumed that a finite Markov chain with uncertain 
transition probabilities is observed for n consecutive transitions} and 
the prior-posterior and preposterior analysis is developed. 



Ine report eon-ifcades with * samsay of specific results for a 
twd»stftt* Markov chain wfeen the matrix of transition probabilities 
has the ciatsls beta distribution. 



thesis Supervisor} Ronald A« Sgraard 

Titles Associate Professor of Electrical Engineering and 
Associate Professor of Industrial Management 



ACKSCWLSD0SM3MT3 

Professor Ronald A* Howard* who first quickened ay interest in 
Harks? processes and Bayesian decision theory, supervised the research 
reported here* Early drafts of the report were read by Professor 
Gordon K. Kauftean* who Bade many helpful suggestions* and by Or* George 
Ft. Morray and Ralph L. Hiller. Professors &• Perneworth Bisbee* John 
D* C. Little* and George P* Wadsworth also served as thesis readers. 

The staff of the Massachusetts Institute of Technology Library 
System* in particular* the members of the Information and Reference 
Departciant, rendered valuable assistance* All computations were 
performed at the Computation Center of the Massachusetts Institate 
of Technology* 

f^ entire graduate progress vas supported in full by the U. S* Havy* 
an organisation to which I as deeply indebted* 

Finally* I oust acknowledge the valuable contribution of ay wife* 
Betty* who carefully typed several versions of this stu$y* the notation 
of which can at best be described as tortuous. 



S&BLSOF QOfflWm 



HtX© Bag© 1 

Abstract li 

iv 



■Baibl® of Cbatanta v 

Quotas 1 1* Xr&so&aot&osw 1 

%•% Bs$roslaa talaioa Ibeossr and Haxfcov Chains* i 

1*2 Definition ana dotation. 5 

1*2.1 Hasfesv Oisia &ith Altasfaattaras* 6 



Caaptsa? 2* ftn&lies of l&staJIbcitloas Closed 

URdas» Sealing* 9 

2*1 Families of Oistsibafctes C&osed Eader 

a Soapaing Hole. 9 

2*2 the Katrta Beta S&sftsAbotlon* 12 

2*9 Ftea&Uas of E&sts&bBt&ons Qosod Undo* 
the (fenseoattoe Sacaplisg nolo or t&o 
v»Sfcep Sfls^Sing Role* 15 



2*3«i Faa&3ies Closed Onder Conseattlve 
Ssap31ng* 

2*3*2 flseUioe Closed Under M»Step 



16 

19 

2*3*3 large Sample Choosy* 23 

2*& 5aa© General Properties of dosed 

fta&lies of Disis&bsitioJ®. 32 

<&apter % Adaptive Control PrsbXeos* 38 

3*1 Oisocsanfted Processes. Soraalation. 38 

3*2 Eaistene© and tteaqaeaess of ^(^f )• ^ 

3*3 Saltation fcy Stteoesslve Appsosdoations* 59 



3*4 Raasrs&v© O3sptitatloa, 56 

3*5 fteaosloal SaoapLe* 60 

3*6 tSwi&soQQnfcaS Preosases. 66 

Chapter 4* Exacted Steady State PirebaHSitiaB 

and Related Qaant&t&as. 68 

4*1 The ft-Step Ifcaaait&on Probability Hatxds. 69 

4.2 %a Staad>-SUta P*oba!s42it5r Vector. 73 
4.2.1 Esdotenea of the Moaants of j£* 73 
4*2*2 An Srgo&c taaoraB* 77 
4*2*3 Soooaselva Approsdaationa* 36 
4*2*4 Vaidsmoe-Covartan©© Matrix of j£. 94 

4*3 Expected Dlaoamted Reward Veotor. 96 

4*3.1 anjaotad VaSoe of £(P). 97 

4*3.2 A Fractional Equattee *br K^)« 98 

4*3*3 A flssaarical Btespla. 101 

4*3.4 Vaxlenoa*Covarl&noa Katris. 106 

4*4 Iho Prooass Gain* 108 

4*4*1 Mean and Variance of g(|). 109 

4*4*2 Approadaationa to the Keen and 

Covariano* Matrix of £(?)• lid 

4*4*3 An AbaHaa Xhaorcst* 113 

Chapter 5. Tamlnal Control Probleos. 116 

5*1 l&aosunted Procoaaaa. Kodal I* 117 

5*2 Qdatanaa and ftdqaanasa of Solutions. 

SoocosalTO Anproxtaatlona. 123 

5.3 fflLaooontad Proeaeaea. Ited®l n* 129 

5.4 Apprass&isato Teratnal Daeiatoos* 138 
5*5 tJnti&eooontad Prooeaaaa. ^4 



5.5*2 Hriel XV. #>6 

5*6 Dlsoonnted Processes With Setoff© Costs. #*8 

Chapta? 6. Dlstsibat&on Theasy. 15) 

6*1 The Vft&ttle DiatyibaUon sad Related 

Distributions. 150 

6.1.1 The itoiitl© Distsibation. 153 

6*1.2 Hsassats of the Whittle. Dletetbation. 15** 

6.1.3 Ei© Whlttle-l matsibation. 161. 

6.1.4 Th® TJh±tfclo*2 DletflbBtlon. 162 

6.1.5 Kessats of the Nfaittla-2 ttatslbatlon. 165 

6.2 Ibe tfalt&^as&at© Beta Blatslbatiea* 169 

6.2.1 the KGltlTOiate Beta Density 

taction. 170 

6.2.2 She Eoltivaflate Beta E&stiifcation 

^notion. 17** 

6.2.3 t£» Nonatandavdlaed Multivariate 

Beta OUtaribotleru 1?6 

6.2*4 Ha?^nal and aem&t&oael 

Dlstslba^joaa. 17? 

6.2»5 Boasot Females. 179 

6.3 The Mata&s Beta Dlstiibat&en. 130 
6.& Sstsadtsd Hatoval Gemjogate DletgAbationa. 188 
6.5 ISso Beta-fe&ittl© Qtets4batio». 191 

6.5*1 Ste© I3eta**Mtt2s Dlstsibation. 193 

6.5*2 She Beta^lfalttle»2 mstsAbat^ou. 195 

Chapter 7* Fiscal Saagl© Slse Analysis* 199- 

7.1 Initial State Known. 199 



-viii- 

7.1.1 Prior»Posterior Analysis* §00 

7.1.2 Stapling Distributions and 

Preposterior Analysis* 200 

7.2 Initial State Unknotsn* 202 

7.2*1 Prior-Posterior Analysis* 203 

7*2*2 Sanaling Distributions and 

Preposteslor Analysis* 203 

7*3 System Operating in th© Steady Stata* 205 

7.3.1 Prior-Posterior Analysis. 205 

7*3*2 Sampling Distributions and 

Preposterior Analysis* 207 

Chapter 8* Specific Results for a Two-State 

Markov Chain* 209 

8*1 Preliminaries* 209 

8*2 fypsrgeoaetrie Oseffloiente* 211 

8*3 Expected n-Step Transition 

Probabilities* 212 

8.4 Expected Value of ^V^* 216 

8*5 Steady-State Probabilities. 217 

8*6 Process Gain* 220 

8*7 Total Discounted Reward Vector* 220 

8*8 A Generalisation of a Result of Shor* 223 

Chapter 9* Concluding Remarks. 229 

Appendix A* Glossary of Symbols* 233 



Appendix B* Program VXTSRATXOH To Solve 

Equations (3*2*1) and (3*2*2)* 2#J) 



Appendix C. Program PBX MATRIX 10 Compote 

Equation (4*1*2)* 243 



•!»■ 



Appendix Dm Progress PXAPROX lb Oanpnt® 
EqoatAons (fc.2.*#). 



Appawttx E. Pregrea VASXMP tb Osapat© 

Sqaattoa (^3.13) . J&8 



Sibaiegrapby 25* 

Biographical Hot* 256 



CHAPTER 1 
IKTRODUCTION 

!•* ftfll Yft fiftfln P flft ltfi WP Theory and Hagkay Qifl&fia- 

The basic concept of a Markov chain was introduced by A, A. liarkov 
in 190? and since that time the literature on the subject has grown 
remarkably. Fundamental investigations by Kolmogorov in the 1930 e s 
extended the mathematical theory to chains with an infinite number of 
states? DoebUn and Ooob made important contributions during the period 
i935^9k5» The present state of the theory of tlarkov chains is summarized 
by Chung [12] • 

07 1930 it was well recognized that the Markov chain is a useful 
model for a multitude of physical processes and an increasing number of 
applications of the mathematical theory have been made to problems in 
such fields as physics » chemistry » biology y and operations research* In 
these applications it is generally assumed that the matrix of transition 
probabilities is \amm v although* since l95**s questions of hypothesis^ 
testing and maximu»»lLkelihoed estimation have been investigated. These 
latter results are summarized by miHngsley [10] 9 who gives extensive 
references. 

During the past tm decades Savage's interpretation of the work of 
de Flnetti on subjective probability has renewed interest in Bayesian 
decision theory. Contributions in this area have been made by many 
researchers 9 Including Von neumann 9 lfeld» Blackwell, and Girshlck* le&cr 
to the current work of Raiffa and Schlaifer [33]» which., to a large degree. 



presents a unified theory of statistical decisions tihlch is suitable for 
applications. 

Recent research at the Massachusetts Institute of Technology [13, 1h 9 
38] has been directed toward the application of Bayosian decision theory 
to various models based on I!arkov chains xdth uncertain transition 
probabilities* These efforts have demonstrated both the feasibility of 
such decision models and the need for a more thorough investigation of the 
underlying mathematical theory* The present work attempts to establish a 
theoretical basis for some decision models union involve a finite tfarkov 
chain with uncertain transition probabilities f particular attention is 
given to sequential decision models* While we have dealt, for the most 
part, with matters of existence and convergence, the question of numerical 
computation has not been neglected. There are, however, many problems 
of numerical computation in this area which are yet to be solved. 

In 2,953, L* s, sfcapiey [36] * using a gans-theoretie formulation, 
studied one of the earliest sequential decision models in a Harlcov chain 
with alternative transition probabilities, which wore assumed to be known* 
Similar game formulations have been examined more recently by Zachrisson 
[42] and Shor [37]* A more general class of ITarkovian decision models 
with known transition probabilities have been investigated by "XL&ckuell 
[U], Deman [16], noward [22], and others, using the techniques of line 
and dynamic programming* These models have been extended to semi*Itarkov 
processes by Howard [23] and Jewell [2&, 25]* Further references are given 
by Jewell [25], 

Silver [30] has investigated various questions in a Ilarkov chain 
with uncertain transition probabilities and rewards* In particular, h© 
has treated the problem of a natural conjugate distribution for the 



-3- 

data«generating process of a Iiartoov chain and has attempted to find the 
expected value of certain functions of the transition probabilities » such 
as the steady-state probability vector. These results assumed a specific 
prior distribution for the transition probabilities » a generalization of 
the beta distribution union we shall call the matrix beta distribution.. 
Ilany of Silver's results are generalised in the present work, 

Cozaellno [13] has examined a sequential decision model involving a 
two»state chain with uncertain transition probabilities. In a related 
stutfy, Ooasolinot Gonsalea-Zubieta, and Miller [i^] have developed 
heuristic methods for treating sequential decisions in a liarkov chain with 
uncertain transition probabilities. Their findings are based on Monte 
Carta studies. 

The results of the present study are obtained under the assumption 
that the prior distribution function of the matrix of transition 
probabilities belongs to a family of distributions which is closed under 
consecutive sampling. This concept is formally defined in Chapter 2» 
where some properties of such families of distributions are derived. 
In particulars it is shown that there are an arbitrarily large number of 
such families 9 thus providing considerable generality to the entire study. 
Additional generality is obtained by stating all theorems in terms of 
distribution functions and Tdemann»Stleltjes integrals » making them 
applicable to both discrete and continuous prior distributions. • 

In Chapter 3 we consider a discounted adaptive control model in 
which alternative transition probabilities in a Ilarkov chain are sampled 
over an infinite tine period. The problem of choosing a sequence of polic 
which maximizes the expected discounted reward over an infinite period 



* 
Cf. Section 6.3 



is f emulated in terns of & set of functional equations. It is shown 
that these equations have a unique solution and a method of successive 
approximations which converges raonotonieally to this solution is considered, 

Certain functions of the transition probabilities* such as the n-step 
transition probabilities* the st©ady~state probabilities, the discounted 
total reward* and the gain* are treated in Chapter k 9 where we obtain 

recursive equations for the means, variances* and oovariances of these 

i 

quantities. An important result of this chapter is a proof that* under 
quite general conditions* the mean n-step transition nrobability matrix 
approaches the matrix of mean steac^state probabilities as n-*>«» • 

These results are applied in Chapter 5* where discounted and 
undiscounted terminal control models are studied. In these models of a 
Ilarlcov chain with alternative transition probabilities the decision" 
maker can sample various alternatives by paying a sampling cost. After 
a certain amount of information about the process is gained in this 
manner it becomes profitable for him to cease sampling and to choose a 
policy under which the system operatos indefinitely. Those models tre 
formulated as functional equations and it is shown that* with p2?ob©MKSg? 
one* a terminal decision point is reached under an optimal sampling 
strategy. Ue then shew that there exists a unique solution to the 
functional equations and investigate a method of successive approidraU t 

The results of the first six chapters are obtained for any prior 
distribution function which belongs to a family closed under consecutive 
sampling. In Chapters 6-»8 we consider a specific distribution for ih 
transition probabilities which we call the matrix beta distribution. This 
distribution is defined in Chapter 6 and its main properties are derived. 
\!9 also introduco* in this chapter, the Whittle distribution and the 



-5- 
be ta»i Whittle distribution. These probability distributions are utilised 

in Chapter 7, where we do prior*oosterior and preposterior analysis for 

a Markov chain which is observed under the consecutive sampling rule. The 

transition count is identified as a sufficient statistic and is shown to 

have the '/hittle distribution conditional on a fixed value of the 

transition probability matrix. The natural conjugate distribution for 

this data-generating proooss is the matrix beta and the unconditional 

distribution of tho transition count is the beto-VJhittlo distribution. 

In Chapter 8 we consider the results of Chapters ?»6 in tho ease of 
a tuo-»state Markov chain when the nrior distribution of the transition 
probabilities is natrix beta. Explicit formulas for the expected values 
of various functions of the transition probabilities are given in terms 
of the parameters of the natrix beta distribution. 

The results of this study are summarised in Chanter 9 and areas for 
future research are discussed. 



*•£ J&&X&&&B& 3B& 1&&J&2& 



A 



The natrix with generic elenont p. ,, is denoted P « Cp« «]? the row 

vector with generic elenont p. is written a « (p lt ...» p»,)« T^ natrix 

p t 

js is the transpose of g. 

IL vector s 8 (*«• • ••• x»)t is a point in the ft-dimensional Euclidean 

space 9 a 9 and we shall use the customary norm* or distance function* 

« defined by 

|x« » C £ x/ 1 . (1.2.1) 

i»i ^ l 

Similarly, the WxS matrix Pisa point in C ^ and has the norm 



1 

11 " i*l j*i ij 

'landcra quantities are derated fcy the tilde; thus 9 P 9 p, U . are, 
respectively* a random matrix* a random vector* and a random variable. 

Lot h(£) be a scalar function of the M x N matrix £. Assume that 

each row of ? is subject to the constraint 

IS 

£ p. 4 ■ i. i»l f ...» M (i.2.3) 

If F(P) is a distribution function, the Picnonn-Stieltjes internal of h 
is to be interpreted as an I!(!V»l)-fold iterated over the independent 
elements of P: 

8 

/ h(£)dF(P) « 

J fc * <i.*.«0 

If tff^ w ^*(£)3 is ^ matrise-valued function of P, the ?iemasm»SUeltjes 
integral of ^ is to bo interpreted as the matrix of the integrals of each 
elements 



1.2.1 -.Iflfttor i&s£a 1&& /atafffiftUYflS* When we refer to a tfertev 
ehain tilth alternatives wo moan the following process • Let there be I? 
states which the system can occupy. tfhen the system is in state 1* the 
decision-maker can choose one of IC. alternative transition vectors* 
% * (\±9 •••» Bim^ ****• P 44 ^ S ^ e probability that the system makes 
a transition to state j* given that it is currently in state i and the kth 



alternative is used, the vectors j2 are sfffiehfts^e sectors t that is, 

k 



P iJ > °* 



£ p. 4 « i. 



k»l, • •*, K. 



k*£, •••9 K. 

*•* , a • • , »* 



(i.?,6a) 



(1.7,6b) 



1=1 th each transition vector, j^ , is associated a reward vector, 

^i * ^ r ii* # *** r iflP* ^ fcer9 *« * i s ^s reward earned when the system 
nake3 a transition from state i to state $ under the kth alternative 
(•w<ry<o^ k*l, • ••, K.j i»j»l, .*•, N)» 

The transition vectors can bo arranged in a K x II matrix, (jP , where 

act* 

N 



i«i 



<? 



% 



K, 



% 



h 



(1.2.7) 



3k 

Let the corresponding reward matrix be denoted by<S( • Reserving the term 
fltaehaatia E8&&& for square matrices of non-negative elements whose 
rows sun to unity, we shall call a K x N matrix whose dements satisfy 

(i.?.6) a ggneratoqfl aWctwffttQ aafrAg. 

\ uattiPBT consists of the selection of one alternative in each state 
and may be expressed as a row vector £T » (<3~£, ••«, CT M )j> where <T^ is 
the index of the alternative solected in state i( ff: » 1, ,,.,, Kj), The 
stochastic matrix which governs the transitions of the tfarkov chain under 
a specific policy* J2T, will be denoted by PCS£) or, if no confusion will 



result, by P. Tho oorrospondlns reward matrix under T»21ey J£L is r(2») 

8 tt 



or R. The set of oil possible policy vectors, i£, Is denoted E smd Is 
a finite set. 

The natrix 2 o 91 * b° regarded as the parameter of a Markov chain ^&th 
alternatives! uncertainty about <P is expressed by regarding (P as a 
randora natrix i&th a prior distribution function, HCC?^), Which has the 
parameter t • In general, f is a point in a nultidiiransional Swell dean 
space* The range sot of J is the set of all K x H generalized stochastic 
natrieos, denoted >o K N s 



A.N ^ [ll 



_ k N k 

(lis K as M, &m>0 9 r. p^. » 1 (k»i, • «., K t j i,>i, ..., W) 

(i.?.G) 

Ue renark that>S? K « is a closed and bounded, hence, corapact, subset of 

tho KN-dinensional Euclidean spaoo, JL . Tho distribution function, 

H(£lt), is a function of tho K(N»i) independent elements of £, 

P^l» •••» P| „ -• for k«l, ...» K^ and i»l, ..., M. H(<^\f ) has the 

usual properties of a multivariate distribution function; in particular, 

dH(|\t) a i. (i.?.9) 

Tl 
Fmn H(^K) can be obtained the marginal distributions of the Tf" %■* 

possible- stochastic natrioos, £(£)• The marginal distribution function 

of '?(<£) is denoted P—Cglt) o*t «&«& the dependence on J£ is clear, simply 

F(P i«f)» The range set of g is ^ N , the set of all KxN stochastic 

matrices : 



/< 



^ K » \g\i is N x H, P^O, E Pij" 1 ttt**f •••• »)f • (1.2.10) 



CHAPTER 2 
FAMEUTES OF DISTRIOTIOiiS CIOS2D 
UNDER SAMPLING 

I tech of the discussion presented in the Ibllowlno chapters is 
carried out undo? the assuopfcton that nC^lSf)* the prior distillation 
of 5» is a nertbor of a fanily of distributions closed undo? a given 
sanpUnc rule. Mo formally define this concept in the present chapter 
and derive some properties of such closed families of distributions 
which tdll be used in the sequel* 

The notion of a fanily of disfcrllajtions closed under sanplinc is 9 
of ©ours®*, not a new one. In Great Britain, G. A. Barnard C53 in 195** 
and, nore recently© G. H. Uetherill C#]« bave applied this concept to 
sonplinc inspection problems* In this country » R. 3eHnan [6] and TTeHeian 
and Kalaba [0] have used the idea in connection with adaptive control 
processes. A particular doss of distributions closed under sampling, 
known as mlaH& <&&&&& &£&&&&£££$ £o*as the basis of recent 
research by rj&lffa and Schlaifer C333 in statistical decision theory* 

The properties of closed families of distributions which are derived 
in this chapter and their application to decision problems in a r&rkev 
chain with alternatives are oricinal tilth the present work* 

Consider a sequence of transitions within a llarkcv chain with 
alternatives. A ffflnqflflflft &Q& is & set of specifications which detonate 
the foHowines 



-10- 

a. The distribution of tho initial state of tho ehaisi and 
the initial policy under which tho process is operated. 

b« The transitions at which policy changes occur* These 
transitions nay be doterttdned probabilistically* 

e* Tlie distribution of the new policy when a policy change 
occurs* This distribution is a probability mass function over 
the set of policies* E 9 and allows for randonlzed selection of 
policies* 

d. The transitions at which tho stato of tho process is node 
known to the doelsion-nakor* Those transitions may be determined 
probabilistically and* when they do occur* an fll^flnfflrt&flfl of the 
process is said to have taken place* Thus* an observation of the 
process is a random variable whose ranee is tho set of state 
indices* £i p *** 6 II J • 

e* A rule for termination of sampling* 

W© adopt the convention that* if a policy chance or an observation 
occurs at tho nth transition* it takes place immediately after tho nth 
transition has occurred* 

There are two sampling rules which are of particular importance in 
succeeding chapters* consecutive sampling and >«-stop sampling* 

A &2BSg£3&J@ amS&m J322& &£. &S& & is characterised as follows. 
A specific initial state and initial policy are selected with probability 
one* A total of n transitions ore to occur* with n selected in advance* 
Each transition is observed* foUey changes* if they occur* take place 
at predetermined transitions and* at each change* a predetermined policy 
is chosen with probability one* Thus* a consecutive sampling rule of 
aise n consists of n consecutive observations of the states of a Tarkov 



chain tilth alternatives under a sequence of policies which is selected 

in advance of sampling* 

positive integer* n* a sequence of a positive integers* (u^* • ••* \>/V * 
and a sequence of n policies* {zs* •••• 21 \ t a*® selected in advance 
of sanpHng* Wo allow the possibility that sen© or all of the ^ are 
equal* A specific initial state is chosen with probability one and a 
sequence of ^ transitions are alloyed to occur tinder the policy 2J ♦ 
The state of the Harkov chain is observed after the \>jth transition* 
Then v* transitions occur under policy ££L 9 th© state being observed after 
the m th transition* and so on* A total of n observations are taken in 
this naimer* The M-step sanplins rule will be used in one of the testf&mX 
control nodels of Chapter 5» 

He now proceed with th© definition of a fanlly of <&strlbat±ons 
©losed under a sanplinc rulo. A collection*^* of probability distribution 
functions is said to be a £aSJUSL fi£ $L8fa&&Msm ti&E8S& ^.JL & f &31 
naabers of th© eoHoction have th© sane functional fbrn and differ only 
in th© values assicned to the paranetor ^* The sot of values which t 
can assuno is denoted f * temod tho fflfafclfiiflMo, raEanotar aa&* The 
ofalssablo parameter sot is aeauned to be a connected subset of a 
(possibly nultidinensional) 3uc31dean spaco* 

Let a sonpHLnc rule bo specified and assuno that a saspla of n 
observations* 2^ » (2%* • ••* sO* has resulted under that sampling rule* 
Denote by JZ (jL* J ) the 3HceHhood of the sanpl© a^ under the given sanplis^c 
rule $.von that 5 « j£ • Let the ps&or distritartion function of § bs? 
II(J If )* a nenbor of §£■* a farily of distributions indeed by ^. Then* 
if dH(£ \\) is the prior probability that g, lies in an inflnitesinal 



-la- 

neighborhood of $ , tho ?festerior distribution function of 2 is 
H(f | V» 2 )• defined by means of Bayes 9 Theorem* 

d«(f l r, ^) - AHWil 1 ') . (2.1.1) 

If H(f I «p, 2^)£& for all ^'cfr and all samples x^ of non-zero 
probability, then 4f is said to bo closed with respect to the sampling 
pule pfelch determines ^(3^1$ )• In this case tho posterior distribution 
is denoted \lW\ 4^), uhore 

'lore T is the napping of $T into 5T induced by the transfbma&on 
(£•1.1) trfion ^f is dosed under tho given sampling rule. 

In tho special easo xihere the sample consists of a singlo transition 
from state i to state J tinder the leth alternative in state i 9 *f"« I rail! 
be written 

t"«l£ J <t , >. (2.1.3) 

If a fixed policy J^ is in force, tho superscript k » 0"t nay bo 
suppressod in (£.1.3). 

In Section 2.3, families of distributions uhich are closed relative 
to the eonsecutivo sampling and ystep sampling rules aro discussed in 
detail » In order to carry out this discussion, sono properties of the 
matrix beta distribution are required. Those properties aro summarised 
In the next section. 

2.2 Tho Matrt* Qata nifltgihtrtiftn. 

The matrix beta density function, defined ty equation (2.2.1) boim?, 
tdll be shown to be the natural conjugate distribution fbr the likelihood 
function of the consecutive sampling rule and, hence, is of intrinsic 



-1> 

importance. I Joreover „ as txlll be soon in Soetlon 2.3, mny of the 
properties of arbitrary families of distributions which are closed relative 
to the consecutive sampling role or the v»step sampling rule are related 
to characteristics of the matrix beta distribution. For those reasons? 
the prineipal facts about this distribution are summarized in this section 
without proof. Gomplet© derivations are given in Chapter 6. 

Tho K x N random generalised stochastic matrix* 3 * W&^ 9 ^ 8 saS>d 
to have tho matrix bota distribution tilth parameter W * ^W ^ $ 
has the joint density function 

tyn^ht i* ) - «* > ft n n 1 (plfj^r 1 1 j *z 

KP -»J •* lBl ^ lfftjL 13 ■ M 

* 0. elsewhere (2.2.1) 

Tho normalizing constant* kC%) f is Given bv 



Where 



N K 4 p(mJ?) 

k(to> » n jr „ * — — • 


\" oC-nC.) 


k ■ k 

Ha * 2 m7.. *^» •••§ % 
1 $»i ^ J ipi 9 ..., IT 


(2.2.3) 


i K x II matrix such that 




n. .>0. \&&f ...9 K* 


(2.2.**) 



It is shown in Chapter 6 that 

J*W U) <$lW d f -!• (2.2.5) 

For k«i» • ••» K^ and ± 9 fc*l 9 • ••» N» the neons and variances of the 
©laments of % aro given bgr the formulas 

ECpf 4 ]a -2g- «p*j (2.2.6) 



-Inl- 
and 



Mj+i 
Tli© eovarianees of the elenents of J are 



.JriL ■■■■ ■■i»it/n- i« u • (2.2.7) 



k Je 



op yS ( h £)2 q£ + i) a^ Bl> ## , f R * 

0»6»1» ...» II 

- 0. J * It or a j& Y (2.2.8) 

Let j^ » (x^, x- t • ••» x ) be a sanpl© of n transitions obsessed under 

the consecutive sanpling rulo 9 uhero x is the initial state* known in 

advance of sampling. Let f * denote the nunber of transitions in ^ ffcom 

state i to state j under the kth alternative in state i (te*i 9 •••» K. $ 

ipj^f ...» N) and define the tTfttVflf&ffll count of tlie sample as the 

k 
K x N matrix F « ^4 4 3 * Thcn ^° conditional pirobdbi2ity 9 given that 

% » f 9 of observing the sample x^ is 

n tt n 1 (Pi.) w 

i«l jol fc*i 1J ' (2.2.9) 



If the rule bgr tddeh the sample size n t*as selected is noninformative in 
tli© sense of Raiffa and Sehlaifor [33] t thon (2.2.9) is the likelihood of 
the sanple x^. It is dear that F is a sufficient statistic for this data* 
generating process and that the natural conjugate distribution is the 
matrix beta distribution. 

Th«r>T»am 2.2.1 Lot % have the matrix beta distribution tdth 
parameter % % and suppose that a sanple with transition count F is observed 
under the eonsocutive sampling rule tilth noninformative stopping. Then 



-15- 
the posterior distribution of I* is oatriae beta with parameter 

<%» a ^t 4- F. (2.2,10) 

JQSBB& ^ S&yes® Iheorein the posterior 61striLbQt&on t D( gl^» 8 F)» 
is proportional to the product of the kernel of ths likelihood function 
and the kernel of the prior distribute* 

D{tiwsg)ocn n n 1 (p;v j j (2.2.U) 

* iai ^d k?»i 1J 
The right side of (2*2.11) is the kernel of a mtriLx beta distribution 
uith p&ranster W + F. Q*E*ft# 



■ 



flamHagg ^^g. The ftoni3y of oatrix beta distributions Is olosed 
tilth respeet to the consecutive sampling rule. 

*• the corollary follows directly fron Theorem ?»?A* Q.E.D. 



2.3 RmMiVflff ft? J21ala&lfflS4 ona gtosfld IfeKfege the fensefflftfoffi flfflffliMllff Mfl 

In the follm&ng chapters to shall cerdln® our attention to models 
based en either the consecutive sampling rule or the v^step sampling 
rule. Sean© properties of families of distributions ^hieh are closed 
under either of these rules are established in this section, Spceifieal 
it is stem that there are an unlinitod number of distinct families of 
distributions vhloh are closed under the consecutive sampling rulo 9 thus 
allotdnn the de^slon-maker considerable latitude in selecting a prior 
distribution for Q • A lemma of fundamental importance for the dem'" 
of consecutive sampling models is next established', We then turn to 
£sallie>8 of dls'&Lbutlans closed under the v»step sams&ing rule and it 
is shewn that the class of such ftrdlies is identical tilth the class 
families eonsis'&ttg of probability fixtures of distributions fmm a £ 
closed under consecutive sampling. It then follows that any family of 



-16- 

distrf-totians closed vmdor vstep sampling Is also el&sed undo? consecutive 
sampling* Finally, it is proven that* for an arbitrary prior distribution 
on v ft if u oboervatlons of the I'arkov chain are obtained under either 
sampli^* rule* then* with probability one f the probability mass of the 
postorior distribution tends to concentrate at §, the true state of 
nature^ as n-* <*> • 

In Section ?•? it was shown that the natural conjugate distribution 
for th© consecutive sampling rule is the matrix beta distribution* 
Attended natural conjugate distributions for this sampling rule etm be 
constructed as follows* Let g((P)») be a non-negative Ttorel function 
defined on Jo which is oositive over sofae subset of <k> * The 
parage tor to is a point belonging to J*» » a subset of a Euclidean 
space. Let % « DVjl be a K x H matrix with 

m* > 0, k»l, ,..* K. (2.3»i) 

13 U5*U ...t h 

We assume that g(f|w) is sufficiently well-behaved that the integral 

f* N V K k 4 

In K TrCpf.,)^" g(£\*>)d£ -1/C(9» t ^) (?.3.2) 
j J i«d j"l k»l 1J 

exists for all *> e XL and all 9tf which satisfy (?.%!). Let 

K<Z\%») « cCto.a/) n IT n 1 <»? 4 AJ ' * e(ffl«>. ?«< 
- - - i»l ^1 k=i ij 

« 0. elsewhere (?.3«3) 

The function h(<£\%4?) is a non-negative Ftorel function such that 

h(jtj|,« )df ■ 1, (?*3.4) 



/. 



'K« 



See loove [?$]» pp. 106, ff. 9 for a discussion of Borel functions* 4 
Amotion which is continuous at all but a finite number of points ©an be 
shown to be a Dorel function. 



-17- 
and is » th©5»e£erQ, a probability density Itesriion. 

Corresponding to any function %($\ (o) which satisfies the preceding 
requirements, wo define the extended natural conjugate family, ¥f~£t 
indexed by the ordered pair (<Dt, &>), as tho collection of probability density 
functions, h(^j[^t,&>), defined by equation (2*3*3). The followfjig theorem 
shows that H"g is closed under the consecutive sampling rule* 

y^eftrfl ffi £A1 Let Mr, be a family of probability density functions, 
h ($Vft£»4>)» as defined by equation (?*3*3)« If the prior distribution on 
$ is h(£l^* f a>*)e^fc and if a sample x^ * (Xq, ...» x n ), with 
transition count F « [ff *]» is observed by consecutive sampling^ then the 
posterior distribution of $ is h(<?l<faf * F,$;»)e^g* Thus, 9^*2 is 
closed under the consecutive sampling rule* 

2S&&* Tho posterior distribution of (P, D (g\^»,*;% x), is 

proportional to the product of the kernel of the likelihood, function and 

the kernel of the prior density function* 

• k k 

D(fl^%^S^)cC Jl II IT (n?.pJ ^ g(f|^ v )t (?.3.5) 
s i»l &L k^I ** 



from "which the theorem follows* Q.3*D* 

The parameter 4? provides the decisloiMnaker with additional 
flexibility in encoding his prior knowledge about vP . It is io be 
noted, however* that &> remains unchanged in the posterior distribution and 
is, in that sense, a nuisance parameter. An example of an extended natural 
conjugate distribution is presented in Seetton 6*fc* 

The noxt result is of fundamental importance for the development of 
the sucoeodin^ chapters* Some additional, notation is required* let 
£ a (y%* •••» y|r N ) be a point in the Euclidean space E and let I denote 
an interval in £..., 



-18- 

x ■ {^K* y x £ P 1 # (* ■ i» • ••» xn)} , (?.%6) 



where a^ < p^ (is l t ,. #f KN)« Let Q be a partition of I into a 
number of mutually exclusive and exhaustive intervals » I«» • *•» I a * Fb? 

each I j we define the volume 

KN 

v(I M ) » ft (fit - Ot v « l f ..., n (2.1.7) 

i»l x * 

and let v a "J* { v (1^)1 • Finally, let X? denote the event that a 
transition occurs from state 1 to state 3 under the kth alternative in 
state i. , > 



JLuk£ 1*& H(f If )e$f 9 a family of distributions closed under 
the consecutive sampling rule and let g(g ) be any lntegrable fraction of 
® defined on rf^ N » Then the following identity is valid: 

JpijGCg )<*!($ lr) -p^C*) /g(£)<i'Kfl^(r)) 9 <?.VJ) 

K*m. y ... 9 K - 

i^j"*, ...f N 

where off .( 40 "is the marginal expectation of p. .. 

" 1 j * J _ 

£g9ffi£. &** I be an interval in £,« which contains ^ K N . Par any 
partition Q ©f I 9 lot (J » ^ n iP 3 denote an arbitrary point of 
I v /l jf tn m& let A v (t) « PCf €l^n^ K fi lf3 wh«* $ tos th« 
distribution function H(9)4r)« Then 

/»,, lira n k 

Using Dayes f Theorem^ we have 



-19- 
^ji f^y ^K,N» * 1A v ( ^ (2.3.10; 

But 

and, by the mean value theorem, there is a point (? v * C(p 41 ) 3 of 
I v /\j/ K M such that 

Since (P is an arbitrary point of I„ (\jf., „# «• «ay set (p. J » (p. .) 

yields 

and equation (2.3«9) becomes 

#.B(t)dtK«lr)« pf,(t)n^ E g(f v *>A <T* (*)) 



?.3«2 iMJ&*g fi%rc4 Ul ?4mr "»?**»? sarma^ng. 

Let us now eonsidor the likelihood function associated with & 
v»step sampling rule of sis® n. This sampling rule is described by the 
sequence of transition numbers, S v«, ••«, v~V » and by the seqcsnce of 
policies, < J35© •••» JJJjj * I*t JL.-*- (*q» •••# 3^) denote the resulting 
observations, where sl is the known initial state* Lotting p^'CfiC) 
denote the (i,j)th element of the matrix (JjCSD) > *h® conditional 



-20- 
probability* given that (t ■ @? » of observing the sasiply x f is 

p m!' C=i> ££ CE) ... P<>> (£L> - IT p< v J } x C2TJ. (8.3.15) 

If the rule by which the sample slse n uas chosen in mninforraativej, than 
<2.3.*5) is the likelihood of the sample x . 

Lst $f bo a family of probability distributions indexed ly ^e$. 

For any fixed positive integer m let & » (a 1t *** 9 a ) be a stochastic 

vector. A QgababUlty mixture of distributions from & is defined to be 

the UE&ghted sum 

«*ttl*fi. ••••*«• a)- £ V®^^ (8.3.16) 

«hero n(£\ 40e^. (i»l 9 *** 9 ra). It is clear from the definition that 
^(JLl+i* •••• t » &) io ala<> a probability distribution function for 
$ • The &&S8& 9ffiftffiffl to of Jf is defined to be ?(. 9 the family of all 
probability mixtures of distributions from ^ as a ranges over Jo* 9 
the set of o-dimensional stochastic vectors (for fixed ro) 9 aad as m 
ranges over the positive Integers. Since H(2H' ) * 8 trivially a 
probability mixture 9 $£<=. W • 

The following theorms establish that a family of distributions Is 
closed under vstep sampling if and only if it is the mixed extension of 
a family closed under consecutive sampling and that such a mixed extension 
is also closed under the consecutive sampling rule* 

Thaay cn fr^a Let §4- be a family of distributions closed under 
oonsoeutivo sampling and let $£ be its mixed extension. Then $4- is 
also closed under consecutive sampling* 

fiBMfr ^ a * %n d^siote a sample of size n obtained by consecutive 

sampling. If the prior distribution of $ is n*«?\+» 9 ...» *f« j&°)eW% 

^ ~ * m 



-21 

where &• ■ (a x \ ..,, a^) 9 and if -^(xlg ) la the likelihood function, 
then the posterior distribution is, using D&yes* Thaorsm and equation 
(2.3.16), 

£(&A£ )<»*(« I tl. •••. +V &f) 
**•<*! tV .... tjt as a^) - 



JMsAt )*t*<f l f x . —. **«. *•) 

IT,/* 

ra, i^^i^t )dII( ft^v 



B '*" 



B 

£ a" dn(G > \f« ) (2.3.17) 

i»i * ~ * 



where 



(2.3.18) 



a « , „ , ***■ « 1*1» ...9 o 

and »f » is defined by equation (2.1.2). Since &" * ( a° iP ..., a* n ) 
is a stochastic vector, the posterior distribution of (£ is 

H *<StltJ. .... t w ro a w )«H*. Q.B.D. 

Shporai AJ(A Lot ^- tea family of probability distributions 
indexed by ^ e 5 * A necessary and sufficient condition that # be 
closed under the v»step sanplins rule is that $t- bo the nixed extension 
of 4f , a fatally of distributions closed under eonseeutlvo sampling . 

£ffia£. First assume that n*i. Let X (vg,^) denote the observation 
of a transition trm i to J over a transition interval of length s» under 
the p©3iey J£ « ( cr 9 ..., O"^), The likelihood function is 



-22» 

?;J C^) ■ £ £ P 44 P« 4 ... P 4 « » (2.3.19) 

4 * i t «i ... i «■! u i V* 4 *-iJ 

(m»1) 

uhieh Is the sura of J? tesms 9 each of vhieh Is the likelihood function 

for a semple sequence of length v observed wider the consecutive sampling 
rule. Let H*(g I t*)e #* be the prior distribution of ff . The 
differential form of the posterior distribution has the kernel 

dH(8lt\ X. 4 (v f jgr»«c z ... £ p„ ...p. -dFtCjftt*). 

1J i,«i i -»1 u i Vlj 

1 ^ (2.3.30) 

If $f is the nixed extension of a family closed tinder oHiseeutive 
sanplingo then Theorem 2.3,3 and equation (2.3.20) imply that 
n (f \ + > X^j(vstS)>e &• • Moreover* if # is not tho mixed extension 
of a family of distributions closed under consecutive sampling© then 
^Lii *•* % 4 ** (2 It*) cannot be the kernel of a distribution in 
*H * for all v» £^> *"& ^*. Then* for some v p 2S «^ ^*» the 
posterior distribution is a probability mixture of distributions^ not 
all of which are in <W-% and y therefore® the posterior distribution is 
not a member of $(- . Thus* tie have established necessity and 
sufficiency for the case n « 1. 

For n> i 9 the differential form of the posterior distribution of 2 
has the kernel 

and the theorem foU&m by induction. Q.E.D. 



-2> 
ffiypllMT JU2a3 I* w 1« ft ftally of distributions closed undo? 

the v»step sampling roLe, then % Is also closed under the tio'ssecutiv© 
sampling rule* 

£gga£« Th© corollary follows immediately from Theorems Z*%3 and 
2.3.**. Q.E.D. 



S«3«3 Iflgflo Sam^fi Thftftw ^ Let n(fl^) be an arbitrary prior 
distribution function of f . VJc now show that, If a sample of slue n 
Is observed under either the consecutive or the *»st©p smpUng rulo 9 
the probability mass of the posterior distribution tends, as n-* 00 , 
to concentrate at Q. 9 the true state of nature, t&th probability one* 
This statement is node precise in Theorems 2»3»3 and 2»3*9» Not only 
are these results of interest on their oun merits, but an important 
application of Theorems 2«3«8 and 2«3»9 will bo made in Chapter 3, where 
the question of termination of sampling is considered for terminal 
control models* 

Consider a sample of also n obtained under the v»step sampling 
»ula« Iter a fi3ssd state 1, a fixed policy *£» and a f&seed transition 
interval v, wo shall say a trial ooeurs -whenever the system makes a 
transition fton state 1 to any other state over a transition interval 
of length \> under the policy £1* For a fissad state j, let there be 

associated with the nth trial the random variable X ( j) xahldi takes the 

n 

value 1 if the system is next observed in state j and the value sere 
otherwise. A sample of size n thus generates a sequence 
t%Cl)& »••» ^(S)\ of indepesndesrit, identically distributed random 
varioKLes sshtoh. If §. is the true state of nature, haw the probability 



-2*1- 

funetion 



.(v) 



PC*' (J) » 1] « <£V&> «*A— (2.3.22a) 

a ^ **,... *N 

Pff a (J) • 0] « 1 - q^OO «4,2,... (2.3.22b) 

J^lf •••§19 



and expected valae 

j"l* «.**N 



ECtW)]« ^(SO* ®»1»2*... (2«3.23) 



Ths follouinc Icsma is an innoc&at© consequence of tit© Strang law 
of larce numbers. 

fe£&i*fc2fi& I** (*j(J)» •••» ^ (j)) be on observation of slasm of 
the sequence of trials defined above* for fixed states 1 and j* a flared 
policy J£* and a fixed transition interval \»* If, as m-*°* * state i is 
entered an infinite number of tines and the policy J2Z *"** transition 
interval v are used infinitely often tflien in state i* w& have* with 
probability one* 



lin _JL * (v) 



oad 

•inhere §. is the truo state of nature 



Wo remark that* if v«l and a- u k* Lemma 2.3.6 applies to th© 
eonseeutivo sanpling rule and equation (2.3*2^) besoms 



lin «JL v r /4^ « J ? 



ph»<* m « a 



* x ~<3) a <£\» >*•••••» (s.3.25) 



the Unit holding s&th probability ons. 

A cenorallsed stsehastie matrix* §1 » Cp/.]* is said to bo OTfltfifaff 

if all of its elements are positive* which implies that 

0< P?,^.l* k*lf •••* K. (2.3.26) 

13 i,j^* ...* t* 



trfjEEfl 2 n V7 Lot x tea sample of sis© n obtained under the v*step 
sampling rule* Assueze that, as n*<~, a fixed state i Is observed 
infinitely often and that, when in state i 9 the policy 2L end transition 
interval v are used infinitely often* Then, if the true state of nature, 
Q , is a positive matrix, every state j (3 s !, •••, W) is, with probability 
one, observed infinitely often* 

<£&&£• Pbr fixed states i and j, the poHcy J^, and the transition 
interval v, let {^(j) 5 bo the sequence of trials generated by the sample 
& * as defined above. The hypotheses of the leraaa imply that m-*<*> as 
n-» o» and w© have, by Lemma 2*3*6, 

^L"i" S x(j) »<l ( ^(2D, >i, ...,n<2.3*2?) 

with probability one, Since aj?'0£j > for 3*1, ***,H, (2.3*27) implies 
that, tilth probability one, X ( j) » 1 Infinitely often for eaeh state j« 
Q.S.P* 

This lacsaa can probably bo proved under the weaker asamtaptlon that 
QCSE) is ergedie, but it is sufficient for our purposes to assuma that 
the true state of nature is a positive matrix* It will bo shewn in 
Chapter k that, for all prior distributions of 2 which satisfy a n&ld 
continuity condition, tins set of non-positive matrices is a sot of 
measure sere* 

We again remark that, by taking \* » 1, Lemma 2*3*7 applies to samples 
obtained under the consecutive sampling vul® as well as under the v^stop 
sampling rule* 

Let g be an arbitrary positive nunber and define £ to be the K x V. 
matrix eaeh element of which is e* For any K x H matrices $ and § we 



-26- 
say that 

|f - §(< e (2.3.28) 



i*i* ...» K, (2.3.29) 
i*^*l 9 ...» N 



Clearly s if (2.3.20) holds* then 

Is -a •(& J, I *•</)* 

<€ JKM 9 (2.3.30) 

find the mm« Iff - Qll 9 can bs made arbitrarily small by ass appropriate 
choice of €• Let II(£lt) bo an arbitrary prior distribution function of 
|f and assuno that a sanple* x 9 of also n is observed* Denote by 
n(g\ f 9 x ) the posterior distribution of £ and, for fixed Q 9 let 

P R [ |^- §j <|1 » /dH(f [«ft^) (2o3.3D 

denote the posterior probability of the set 

■ ■ l<£\ i| - Q\<*\ <= A - ' (2.3.32) 

When ws say tJiat the posterior probability raass tends* as n-» «> * t© 
concentrate at Qj the true state of nature* with probability one* w© 
nean that* for any s> 0* 

„tU p n C I?" ^1 ^ ^ ° l t (2.3.33) 

the Unit holding tdth probability one. 

.Thraroq 2.1.8 Lot TiCJItO' be an arbitrary prior distribution function 
of ^ e Let x be a sample of sis© n obtained from a Markov chain talth 
alternatives under the consecutive sampling rule. Assume that the mnpllnft 
strategy is such that* as n-»<*> if stato i is entered Infinitely often* 



every alternative in state i is ssapled infinitely often Ci»i* ««., K). 

If Q» the true state of aats2rs» is a positive aatsls, then* for eny « > 0, 

the liu&t holding i&th probability one, provided HCgW assigns positive 
probability to the set B defined by equation (S.3.32). 

£ssq£* Let F(n) «* Civ -(a) 3 be the transition count of the scrapie 
a^. the posterior distribution of § is H(g|f ,^) 9 Ts*jere 

4b1 tej. teal *«* 

j*C * sl ^h kpl •* 

Letting 

» ij ij 

end raoltipaying the csosmtor end denominator of (£.3.35) ^ the 

normalising constant &(3j(n)) defined by equation (2«2,2)» we have 

dH(^lt»s ) B ^ " * ^ ■ * (2*3*37) 

N '** 

L®t N»* k (o) ■ £ *? 4 (n)» k«i* ...» K (2*3-38} 

0*i»2t*»* 

denote the niE&sr of tiaes that alternative k is ssed in state i in a 

sample of e&se n« As n-^o* at least one of the states of the ohsin is 

entered infinitely often* Leoraa 2*3*? and the hypotheses of the theorea 

iosSy thai*, with probability one t every state is entered infinitely often* 

Thus*, under the assessed sampling strategy* V* (n) -=* °° as a- 5 * «*> tsith 



-28- 
probability on® (k » i, • ••, K.$ 1 ■ 1, • «., £!)• The moan of the 
distribution £^* W) (£lt>j(n)) is f (n) « Cp* (n)], tahere 

f£j(n) 4- i 
p^ (n) a < . k«i, ..., % (2«3«39) 

13 JF(n) ♦ n Jjft •••»^ } 

* nPl § H, • « « 

Thus, if Q. » Cqf .], Lenraa 2.3.6 inpOles that, t&th probability one, 

»~ S?4<»> " q?4 > 0. k*i, ..., K« (2.3*^) 

n-*><* 13 ^ l9 ^ ^^ 

W© now show that, asn-*oo, the probability mas of f* • (£l%(n)' 
tends, tdth probability one, to ooncentrate at g. If « is a random 
&at*ix.tdth the density funetion f * ($l^(n)), the mrginal v&rlane® 

•fit- is 

iJ » .(n) CI - »,(n)] 

Aw - i3 b 3 

^ v^n) + H + 1 

* .,i, l .. l . ftlT . l , ,„ „■,..,■,■■* (2«3 Ai) 

Vg^(n) + M+1 

kai, ..., K 
1,3***, ..., W 
nwL, 2, ... 

Thus, with probability one, 

n^oo A<»> ■ °* to" 1 * •••• K (2.3.^) 

Let €•> and 6 (0 < 8 < 1) b® given. Define the set £ <z.d * m 
in equation (2.3.32) and lot 

tfn)1 » / 1 



* If-Sl^ifjfoH B J f m' H){ ^^ (n)>d * # (!W * 3) 



Since 



-29- 



Si&fM 



Da i-fergan's Saw yields 



tjhere C denotes tho set ewsploaent 
i 



- P C || - §| < « , '|^(n) ] 6 E P C ^ - <£J >€• |%(n)3 



(2.3.*&> 



Lot 



e 4 , ■ <6 
ij If 






The mrgiml vavtem® of r. is 






■ 






(S.3.46) 



But 



M ■ if 'S^H i< n » d £ 



* /& ' if C )( ^ (n))dS 



^€» 



2 



P C |^j * ^j| * €# VSL M 3 (2#3< 



and, using (2,3M} & 



-30- 

P C I^LJ " \jl^ € * lt<»)l ^"^ C ^(n) * <p£ <n) - <£j> ? ]• 

(2.3.48) 
k»i, .... K 
i»£*i* ...,*» 

HP* * 5?* • • • 

aar equations (?.3«**Q) and (2.3.**?), there exists an Integer n suoh that* 
fer all n > n*. 

T»^ lv ij *J *J KN 1*3-1. •••» *« 

with probability one. Thus* for n > n% ^©inequality (?*3**&) beeoaes 

f ..'1- PC ||-S|< « , |S<i»)]<e t n>n* (2.3-50) 

and 

p C U-Sl <■ «• |^[(n)]> 1-6, nvn* (2.3-531) 

with probability one. Since 6 is arbitrary* 

p C If - g| < *• J® < n >3 * *• (2.3.5^) 

the Halt holding with probability one. 

Again defining S as in (2.3*3?) and letting E° be the complement of 
5 in Jo „ »* we have* from equation (2.3.3?)* 



lim 

n-»oe 



> n C ||-S|<*3-,X «K»t.^ 



J f^» w >(2|^(n»dn(f it) (2.3-53) 



ttp 



/ ^* N> <H^Cn»dH(f \*> ♦ /*£ fl °<f l$W Cn))dH(f it) 

Let 6(n) be the maximum of the continuous function f* K,S *(£|f»(n)) on 
the compact set S°. 3quation (2.3*5?) implies that JJ* 6(n) «* 0, with 

•a ^^^ 

probability one. Thus* 



•3Sr 

P n C (f ~g|<g 3^ ^ E ■ r , ' ~>0 <2.3«W 

^ HP 
and, «&th probability one* 

^P B C|I-g|^]-l. 

Q.3.D. 

3&gB£S3 s.n>9 Let Hi^t ) be an arbitrary psior distrifeotSon 

ohain tdth alternatives under the <*»step eaapaing role. Assrae that, 
Hhen the system is obsessed in state i, the ©sapling ml© 3s restg&eted 
topdMwftwSjO £ «d to to****, latere <*. tb. finite 
set Ij » { \>^ 9 »•••, w \ f such that, as n-*«>, $£ state 1 is observes 
infinitely often, ©very poller in 2. and every transition interval in X- 
are used infinitely often (i » i, ...» H). If § , the true state o£ 
nature, is a positive matrix, than for any e > 0, 

P C I J - £|<q 3 «lt (2.3.55) 

the Unit holding s&th prc&abllity one, provided R(^lf) assigns positive 
probability to the set B defined by equation (2.3»32)« 

Jfeefifr 1st K. be the total nasber of ordered pairs* C& w) 9 tsnere 
JLcS^ and vel^Ciai, ...» N), and let K « £ !L. When in state i, let 
k indez the possible policy end transition interval oeafeinatians, (cr, \>)« 
For <■£€£. and vel. » let 

Trju » pf?C£)f te^t .-» K (2.3.56) 

nod define the K x B ssatris JT & Ctt|a3« dearly* JT is a generalised 
sto^jastie rntslx. If the index Is eorresponds to the pair (£L» v), let 



lim 
n-*<*> 



fj.<n) be the noab@p of tines a transition occurred frea state & to state 
j In the sarsple s over the transition interval v tshen tfc© syeten was 
governed fcy the policy ££• Then the posterior distribution of ^ * 8 



n ft n* (^)^ (n) dscjif) 



«£ltt»|)- 



K ® K, 

n n it 

i«i jol k«=4 



<L « j- 1 ** * J 



A,A 



/^^cat^^^ft^ 



(S.5.57) 



where ^ (n) e £*£«(«) + *]• Tbe proof of the theorem tram, this 
is identical to the proof of Theorem 2*3*8* G.E.D. 

We remark that the assumptions in Theorems 2*3*8 end 2*3»9 concerning 
policies -shieh are used infinitely often are not restrictive. It sill 
usually be possible* after a finite amount of sampling,) to eliminate from 
further consideration tbsse policies wfc&ch are used only a finite number 
of tines* Examples of such elimination of policies fcy daminanee arguments 
will be given in Chapter $• In any ease* the theorems apply to the 
raarginal distribution of those alternative rows of f which are observed 
infinitely often* 

Let #- be a family of distributions indexed by tz "£ which is 
closed under an arbitrary sampling rule* Some general properties of 
H ere derived in this section* the symbol >£(ssl£) will be used 



-3> 
throughout, fbr the likelihood of a asaple of else n» conditional on 
ff » <| 9 under the given sampling rule* 

Theft>«g 2^1 let 2 l3avo a discrete prior distribution, 

P C f - J t ] • V f i c A f K <2.*.i> 

i»i 6 2, ..*a 

where Q« ^ 0, £ a. o !• Fbr a Axed integer* a, let V- be the 
* i»i * » 

family of oil each discrete distributions, ind«ssd fey & «= (°i» **• •••» a Q 2 

Ttao # m 1. doted «tor ell eaapling wles. 

iSSJa£. Let ^(S|\£ ) be the likelihood Amotion fop en erKitrary 
stapling rule* If &* is the prior distribution of Q t the posterior 
probability of §* is 

i 



8 

a 



a 

Since a. w ^ and £ a 4 CT a 1, a 01 a (c-», a*®, ... s a n )e& • 
* *_,« * * « a ' a 

Q.EeD. 



This thaorea, tMIe alsost trivial, is of considerable importance 
for the solution of Bayesian decision probleas in a Markov chain in 
practice. In many cases it nay be feasible to place positive probability 
on only a finite set of points of -a v m and to solve the corresponding 

S§fl 

e 
discrete problass, thus osnsiderably simplifying the computations. He 

shall not eaphas&se this consideration any further since most of our 

theorems are stated in teras of Stieltjes integrals and, hence, are 

Cf. Silver C38], «&. 2. 



applioable to discrete, continuous, and missed prior distributions. 

fhaawsa 2^.2 Let 4/ * {fi(f If )| V'e^ bo a family of distributions 
closed under a given sampling rule end* for a flassd policy G~» let 
"^ * {f £ (? |t) | fe£M; f tsisere *'« ^ * be the corresponding 
family of marginal distributions of the f!s8 stochastic matrix ?(£}• 
If the sampling rule is such that* lor n»i» 2 # ••• v it is possible to 
observe a sample of siae n tinder the fiasd policy J£» a *** if the likelihood 
of any sample observed under the policy i£ does not depend on elements of 
J net in ¥(£) 9 then "vV is also closed under the given sampling rale* 

JRppfr Let j£(x \<P) be the likelihood function corresponding to 
the given sampling rule and let j^LG^I |) be the likelihood of the 
sample s^ from the Ilarkov chain governed by'P(iJ£)» The hypotheses of the 
theorem imply that 

for ell samples 2^ from the chain governed V #(«£) • Let X s be the 
range set of the (K • H) s H generalised stochastic matrix formed fcy 
deleting from *g all rows p^ such that k e o-^ (i»i # ,., # h), Than, if 
f<r (* |f •) is the mrginaX prior distribution of £(3^) and if the samp! 
3j^ is observed, the posterior distribution of |(£^ is F^ (P \ ^*, a ), 
vhere 



^■r^f.*)* 



was* 



«*» 



i, 






k\M 



J dB($|t w > » dF_ (Pit »> <2,^> 

for All t«e ¥. she*© V" is defined by ogaaUon (2.1*2>, Tboe, 

i? (Pi f% i,)e^ as*5 *?* Id olased rade? the asapHag pole* Q.E.D# 

The next theoran deals ta&th the continuity of the expeetetien 

I(t)« Jg(£>dH<£|*> 
tfteen regarded as & Amotion of f 9 itfiere g((£ ) is any intograble ftmetaon 
of f . 

A distribotion ftasetloa* H(f It )» id said to bo mh^bb^ |ft jfc. »* 
a point f c^g H if» for any e > 0» there exists a 6 > auoh that? for 
any fixed t 9 |H(f (f) ~ H<f (r # >| <« obsncwer ||+-t»||*C & 

is A continuous ftmot&on of I at 5 a *&** W Hand valne of ^ • 

Xte^jpytlaa, Lot tf be a fanUy of distribution ftmet&ens indexed b^ 
^ c $T • H- is said to bo a jaally e£ afra^faajlan j fr i y it fo o s jaMd&aagafi 
IB JL i^» whenever R(£l1r)« ^ » H($(t) la a eontinaoaa ftaetton of ^ at 
eaah of its eent&naity points* ? • 



2A.^ Let $4- be s> ffcaUy of distribution functions i&daxsd 
by f e 4* txdeh la eontiinsae in *f and lot g(£, } be oay integrable 



Amotion of J| definod on a sot 3czj$ • If g(^) is a amotion 
of t defined ty integral 

than i( f ) is oontimoaa on J* • 



-96- 

gB2&&>< te& t ^» Hxed and let{Vl^ *® «$r seqtienee of points of 
¥ which oonvepgos to */% where t n ^t (n»i» 2, ...)• Let H(f it R ) 
bo the corresponding sequence of distribution functions from ¥- • Since 
& la continuous in ^ » 

at every oontinsltgr point of H<Ji+) and, fcy the Helly-Bray Theorem* 

and, th©reft>re» i<t") Is continuous at t • Q«S*D« 

fo^lUyv g.fe.4 tot #• * ^H(flt)l f«?]b6a fssily of 
dUtiibutSon Amotions intend* * c£ which is OK****. In t mA. 
for e fixed policy J£* let o^ be the corresponding tai2y of aarglnal 
distribution functionsp P^(P|f)« Then #V is a f&aU^ of distributions 
continuous in t« 

pyp»?ft f In Theorem 2.fc.3, lot g(£ )ai, J«j/- M and, for fixed 
£UD«^.« let 



^<§. I Q «^ M f <§ * P* (it**. •••• b)J. (».*•« 



Then equation (2»i>«5) becomes 



and F ^ (? 1 1 ) is a continuous function of *f at P for any Pe^ M . 
Q.E.D. 

j^US &&<& »{H(f\t)\^e€i boafaailyof 



Cf.* far sssaple* Xootre t29] t pp. 180-182. 



•3?* 

diats&totions '.ndecaedji ty f'. Suppose that, fop orogy H(£l*)e $*/ * a 
©ore©spot*3tag tesl^ foaotsion h(|lt) ec&sfcs and that h(f ( + ) i© a 
eosttssaraa tootlon of + for awjy £ e ^ . Than ^ isa faa&3y of 
c&sts&tmtlons oont&Kaoraa In f * 

Pmaf . The eosoSloigr follows iaasditttely ftooxs a tnll<4rastiB tbeorea 
of Integral oaleofcis fct&ob 8tatoa that* if h(gl^) is oo&t&!K&€s in */* 9 
then 

H(flt)« /h(§l+)d£ 

s 

is a oontirmous ftenot&on of *f » uhoapQ, fop a flasd 2 s -^ » 

M 

It is oleep ffcoa equations (S.S.I) and (2.2.2) that tb© mts&s beta 

tar w\ 

density function, f^* (<J l^g) t Is a aontinaofas ItaneUon of the I a N 

isata&x % . 



CHAPTER 3 
ADAPT2V3 OOmOL PROBLSHS 

3.1 aLgfpufltffll .fitoMMaga* &saola&l&&* 

Consider a Markov chain with alternatives in which the process, 
assumed to operate indefinitely 9 is sampled after each transition— that 
is, the decision-Baker knows the state of the process after each transition* 
Information about vf is gathered in this Banner and the decision-maker 
say alter the current policy at any time, as dictated by his state of 
knowledge about $ • Such a process is an fithnffiY? aantrel jgssssa* 

It is assumed that aay sampling costs are included in the transition 
reward matrix, S c [r* ]. This implies that either the sampling costs 
are negligible rahen compared with the transition rewards or that the 
process is operated In such a manner that a sampling cost must be incurred 
after eaoh transition. S^odels in which the deeisioiroaaker may choose to 
sample or not to sample will be considered in Chapter $• 

When future rewards are discounted to a present value we shall speak 
of a ^.flffliHnW adAPJftye <»nt»o^ JZBfcfifiSa* 2t is this class of problems 
which td.ll be discussed, for the most part, in the present chapter* The 
interval between two consecutive transitions is assumed to be constant and 
can be taken as the time unit* Let £ be the present value of a unit reward 
earned one unit of time in the future (O^p^i)* Since the present value 
of the maximum possible reward on the nth transition in the future 
decreases as fT 9 it is dear that the total discounted reward earned over 
an infinite period under any sequence of policies is finite* & natural 



-39- 
criterion to use In choosing policies ls» therefore*, the expected total 
discounted reward over an infinite period and is© shall define the 
fftftfllWlUri 3&B&SS SOSSasSik aaaaaa to be the problem of delecting a 
sequence of policies so as to maximise this quantity* 

Xn the present section the discounted adaptive control problem la 
formulated in terms of a set of simultaneous functional equations* It is 
shown in the following section that there exists a unique bounded set of 
continuous solutions to these equations* In Section 3*3 a method of 
successive approximations is described which converges monotonle&lly 
and uniformly to this unique set of solutions and the question of policy 
convergence is considered* The concept of recursive computation is 
then introduced and a numerical example is presented* The chapter 
concludes with a discussion of the problems involved in treating 
undiscounted adaptive control processes in a Markov chain* 

A specific form of the discounted adaptive control problem— the 
two*»arm©d bandit problem—was treated bgr Bellman [?] in i95&» using 
(tvnamio programming and a beta prior distribution* The method was 
generalized by Bellman and Kalaba [8] and is summarised bgr Bellman in 
Chapter 16 of flflffiftfiYft ffiTfifayA jfrftgaaasa. [6]. Bellman's method of 
solution is based upon the use of successive approximations* 

CssaoHao [131 applied Bellman's formulation of the twe*armed bandit 
problem to the case of a tuo*»state Katfeov chain with t&o alternatives in 
each state, assuming a matrix beta prior distribution* He mapped decision 
regions in the parameter space of the prior distribution for the special 
ease of one unknown transition probability vector* COssoline, Gonzalez 
Zuhieta, and Miller £14-1 have recently suggested various heuristic 
treatments of the discounted adaptive control problem, basing their results 



m aimulat&an studies* Freisser [13, 19] has obtained a solution of th® 
discounted adaptive control problem in the oaso of quadratic cost functions 
t3gr reducing the stochastic formulation to a determlnistie one in terms of 
certainty equivalents* 

'lb® functional equations formulated in this chapter generalise thd 
results of these authors and, in spirit, follow Bellman's derivation [6]* 
Our contribution to the treatment of this problem consists of the following j 

a. Proof of the existence of a unique bounded set of continuous 
solutions to these functional equations. 

b* Derivation of a method of successive approximations which 
converges monotonieally and uniformly to this unique set of solutions* 

o» Introduction of recursive computation techniques £br the numerical 
solution of the discounted adaptive control problem* 

let the prior distribution of §?, H(f \t>, be a member of a family, 
Wr , indexed by ^e ¥ • The ordered pair, (i,t), where i«l, *•*, tt and 
t c J" , can be regarded as the generalised state of th© system* Here, 
i is the physical state of the system and *t suosaarises»«or, more 
precisely, indexes^the deoi3lon»maker a state of knowledge about JP . 
Since the process is to be sampled consecutively, it must be assumed that 
$t- is closed under the consecutive sampling rule in order that we may 
meaningfully wq$op to t as indexing the decision-maker © state of 
knowledge as sampling progresses* 

&s& v. { <f } denote the supremum of the oapected discounted regard over 
on infinite period whan th® system starts from the generalised state, 
(1. f). If R . Jfo ^}. tho aso-mtad total mud u*r «v 
sampling strategy is bounded by 



-41~ 






and 9 tlaerefore, v.Ct ) eaists for i*i 9 ...» H and ail *f e £ ♦ It sAIl be 
shorn at the ©snolaaiori of tfc&s section that ^Cf ) is attained tsnder as 
opUsaX saapaing strategy and* henee» can be regarded as the w&s&ssm 
Gocpeeted discounted reward uhan the system starts frara (t»t)* 

Xf 9 vfmm in state (i» f^» it is deeidad to choose the ktfo alternative 
and the system makes a transition to state J» the supreaaa of the posterior 
eapected discounted reward ie 

r^*^(^(^)). (3.1.2) 

The probability of the sample ©utosse J» unoond&tienal *&th regard to the 
prior distribution of £ • given that the system is in state <i»T) 
that alternative k is in use» is 



v£ 5 i+) « / P^ dH(Jlt) t CU.3) 

the sasginal prior expectation of pf .. Let 

<£<*> » E P^ 4 (t)r^ 4 to*t» ...» K, {%%*) 

fetf 

denote the scan one-stop transition reward ^hea the aystea is in state 
Ci»t) and alternative k is used. The% regarding each v^(^) as a 
teatioa of V* defined on ¥ (i«t» ...» tl)» the suppeauB of the 
discounted espeoted reuard whan starting frssa (i» V) nasi satisfy the 
fcllotdng set of staaitaneous functional equationa* 

!©!,> ...» H 
0££<i 



«4d» 

We nam consider the ©sAstenee of t&a ssaslaRaa expected discounted 
mfii>d over an inflnlio period* Xn ostler to do this, it Is necessary to 
precisely define the notion of a flfflfflEflfrfflfff BtgHtflgT An* ©a adaptive control 
process. 

let the policies.. 2T eS» b© indesssd fcy the integers throats **i 9 

ishsre J is the aasfcar of ©laraeoto in E. Ibus* £ » (2: , gr«* • .»» £ *\ < 

F V.— 0* — % 9 9 ~"j«»i> 

Suppose the systeo starts ftwa the generalised state <i . ft and that 
alternative k has been selected in state i • We can* before the first 

si 

transition oeears» decide tMeh alternative to us© in each state j for 
the second transition. This consists of the choice of a policy* H®«» 
and oan be denoted by d.{i Qt k) » a., a function with range 
^0„i»...» J-lj • 

In general* before as^ transitions have occurred, we oan presoribe a 
policy to be used laKsdiateSy after the nth transition (w»i f 2» •«•)• 
Let a * B (l fl t 1 4 * *••» 1 .) be a possible sample f&st&ry of the first 
n»i transitions and let e^j © (k» 2Ta 4 » ♦••♦ 2k J *** ***• sequence of 
pollelos under tshlch tho sample s . occurred* together %&th £•_ * the 
policy under t&lch the nth transition idU occur* She policy history* 
j^g » is determined fey evaluating the decision functions d <i ,k) 9 

d 2%» %>• •••• < W$>-2> Sn-2> at V *i * <V V* "•• *n-2 

» &£» i.t •••» lj.g)« Conditional on the Markov ohain having arrived 

to state ly with mmfl* hlstew 3^ «ri p»li<sr Mstoiy Vl , - w 
select an alternative for use in state 5 after the nth transition for 
saeh of the R states to tdbidh the nth transition nay bring the aystaau 
this consists of the selee&on of a policy* £1 e£» and is denoted fcy 

^k&b^l* &»»i^ a °n* fi fanotion «&th rang® \o, ...» ^15 . Since there 



ape ST ' ' different smplfl histories,, s «, tihieh start fron a fixed 
state i Qf It is necessary to specify N v ' values of the nt&»level 
decision function d (i j# s >,)• The specification of a complete set 

of decision functions, ^5»»i» fn-P» * or J3F ^* 2f -** * # * **** * 0? **^ 
possible sample histories*, together with the choice of an initial 
alternative in state ± n9 constitutes a flflfflftUftf nt wa g figy. d- Let Bu 
denote the set of all possible saapling strategies tihen the systaa starts 
in state 1* 

In the following theorea it is show* that v^(*f ), a least upper 
bound* is attained for sees strategy d*€l>^. this is done fcg? mapping the 
spaoe of strategies D^ onto a compact subset of the real line and shaming 
that the corresponding napping of v^C^f » d) t the expected discounted 
reward under strategy d» is continuous on this set* 

fheoiHim ^,1.1 Let v. ( f, d) be the total expected discounted 
reuard in a Markov chain with alternatives when the process starts from 
generalized state (1*^) and the strategy d is used* Let 

Ihen there is a strategy d eD^ such that 

v^f) « ^(ft d°). (3.1.7) 

ftaoaf' Fbr sy»i» 2» 3, ••• let x^ ® (i t i j9 ...„ i ) f where 

1« l & » •••» K s » a Q fl i ^» •••• n « ^ °^ n sequence j^ let there 

correspond tlie P-ary number 

aCx^) * S i a !3 e a^Lj...^. (3.^.3) 

Fas- assd n and 1 « i «e nay than orde* tho B"** 1 * di«te«it nth-level 
deaUton iwrtlam ^(j^, ^.p *, foil,,*,, ^(j^, *>4><V<Ul» ^ 



if end on2y if s(&-i) < •V a . 1 >» 

Consider a strategy deft, • Lot the value of the jth ©saber of the 
ytf. (n) **w ' decision Amotions ^(^.jt 5n»i) ia d be denoted d 

(rspio 2, ...) and assuse that the indexing is such that 

d 4 4j.d A <. ... <id . . n*i 9 2, ... (3.1»9) 

nl tig n/*(n) 

&!&8 can be don© einoe the yU (n) possible sample histoxd.es x 4 t&loh 
lead to the nth level decision tactions are all distinct* the stwtegsr 
d eon then be displayed as the ordered pair 

d (k,6) O.i.i0) 

•©here k is the initial alternative selected in state i and b is the 
sequence 

6 " ( d ii* V V —^ 
Letting A denote the set of all possible sequences 6» we have 



'i<t)-i1fe iK4 JS\v , '» k » 6 ^» <*•*• 



«her© 9 if d » (k»a) f v^Cf*? k*6) « v.( Y** d). 

lb each 6s A, let there correspond the J-ary number 






where 6 a c \o 9 i» ...» <M$ for o»i» 2, .... If 6 s ^0,0»0 9 .*«\ then 
y<6) » and if 6 *>\<3*>l $ J»l 9 £»i tf .♦•^ then 

y(6) o (M) E jT* ■ [ oi. 0.1.1&) 

o»l i-J 

Ttma 9 SQr any 8eA# 0*y(8) 4 £* Moreover* it is easily seen that the 

napping (3*i.&3) is a one»to«one sapping of the set A onto the dosed 



«4> 

k 
inters! [0,1] • Lei t^C + *y) be a Amotion defined en [0»i] by the peXattarj 

^(^•TCft)) o v 4 (f i k.6). kel, *.., K. (3.1.15) 

* * i»l, ..., H 1 

Sftea (3.1.12) can be written 

V r) " tS&i o^i ^**< *.?>}. <**•**) 

We now show that, lb? £&«©& k 9 tcfa^ty) is ssntlnasus in y» Let 

R * e i^k {Hy|} • O.** 1 "' 

and 

»*-i3«{l4l]* (3 - i - i7b> 

Let e > be given and ehrase a positive Integer v such that 

* o 
p *l Kj^p < t # (3.1.18) 

JL «• p 

For a f&aed ye[0,13 let y* be any number sash that 0£y*£ 1 and 
|y • y*l < ^"^ ?ben, if y« y(6) and y« » y»(5«) f we have 

e * 6 f <a»i f 2, ... t v (3.1- 



|^<t*y> - Jfct* 9 ) |* s pV- ?) 



SPVfrl 



■ p* 1 " 1 &&£ < c (3.1.29) 



K»s» vf ( tf%y) Is a eonttnoetia Amotion of y on the ooopoot set £0,1] and* 
for each k, there exists a y eC0»l] soon that 

^ t • j£) ■ o^i ^< ft y)} . (3.1.21) 
Letting 3 (k) denote the inverse Image of y. » y & (S ) # 

and there exists a strategy, d* » (k*» €*<k*)), such that 



Q.S.D* 



^m • ^C+t d*>. (3.1.23) 



In ordo? to aha*? tho exlstflnee of a ts&qoo bounded set of ogsstdHOoas 
eolations to the tost&onal ©qaat&one (3.1-5) *• ahaH mak© ose of the 
Bitted of suooossiv© appFos^mtdena* Lot tho Amotions vAvk 9 f) bo 
defined F*w&8kve3y as g&QoBB* 

U4» ...» N (3«2.£) 

flPOflfZt*** 

and 

tfhaare v ( t ) is a set of bocsidsd t<9EB&«al ftnot&cea. 

It sill bo oonroe&eot to intpodneo eome ad&tlos&l notation. Let 

#(v,n,*) » Af) + P 2 ^<t)» 4 (n»^4< »♦ &=*» ...» R_ (3.2.3) 

teir 
and 

Equation (3.2.5.) oa^ thea be «s4fcfc«i 

Vjfc**, t) « «j* ^rt} f irt* ... # n (3.3.3) 

t«¥ 
txdf e&B&lQg$& t eqaat&on (3.1*5) bo®oso 

▼ t (t)« ,gL ^s£(*»~.^ . iPlf • ••»» (3.2. 



mtffm 

fhe flpst pesult is a Xosaa **ioh will be tjsed in sabsegaent ppoofs. 

1&2E& 1<>ZA If V is a boond for the tensLaal functions V. ( t )• 

k(f)U V, 4*1, ...» N O.2.?) 

8fi& if 

than the ftanoteens v^Cn# 4") op© baoBded, 

k<n» t> U flftr ♦ T^fi **• (3.2.9) 

OR), *» «2» ... 

£gaa£. Th© ppoof ia irefeactive. Equation (3.2.9) ©tediously holds 
fop © » 0. Asstsae it holds &s> ia. Tben 9 if k • a msxSjnisoa the pigftt 
s24© of (3«2*i} # 

|v 1 (w4 t + )| ^ |^(*)|+P 2 I^CP) v^^Cf) 



^ R* ♦ p tfV + jWp r*i 



3 

n*i &«■ tf* 1 * 



Q.E.8* 



JfiliE X£ ^he sot of functions ^v.fofH'); is defined \& 
egaatlons O.S.I) and (3.2,2), then the limits 

310 v (n, t) ■ V+> (3.2.MJ. 

+ € 5 

exist and {^(^)\ is a set of solutions to eqoatioa (3.1.5). MapeovetPs 
the oon*epg«ao© is w&f&m in *P « 

&SQ&. It t&U bo established indsotitfoly that, top arfcitpssy 
posits.^ integsps* n and r3» 



-48- 



V*», f > ~ \< H » * > I £ i? ♦ fw * T^rfr r*. o. 



2.42) 



tlfisPO* it 2 t ... 

re S 



yhop© ¥ is a boaad on th© t@8°H&ml amotions. Slao© 0^p<i« it then 
gbUsus tor th© Qatt&p ©sitosion that J?JL \(n»t) oj&sta fey &»!» ...* Si. 
that th© Halting Amotions satisfy (3.1.5) M2o«s fcgr &23orciag a to @d to 
«> in equation (3.2.1). Unlibro eonrosgenoo of th© soqusns© of teeHons 

(V* >) «■■ * "rtteg ttet th. brad (3.2.12) i 9 ***»!■*« 

f. 

lb establish 0.2.12) ws psoosod as fblloras. Using th© fbisaalatton 
(3«S.5)» *bs» any fiaod V"e f: lot 

6 ass < it *") 

Vjfca, t) B s i^» »•*•*) *» iife^ lV Vf »*t+ H * 

Ttm 

^(ntt) • VjCBft) • S^fo n-l,*) - S^C^a »•!# * ) 

£ S*(v* n-i t 4-) • sJCvt »•!« + ) 
and* 8&sllQ8*3y» 

v^te**) - ^(ott) 5t Si|<v f n-i,t) - a{(ir, o»i, *). 
Lot k* indss the mstaa of |sf (v f n-1,*) - s£(v t a»l t f )| snd 
s|(tt, n-l t f) • a^(v» »»1»*)| . Then 

^(nt*)-*^*)! * s£ Kn-l f H-) - s£ <v t n*i # H 

Vj(n.i t ^*(t)> - Vjfert, l£* <*» . 
isl, ..,, 8 (3.2*13) 

flfSply 2» ... 



£ P s Sf 4 (W 



J»l 



V 



Assessing that a > jb» Leasaa 3.2.1 implies the inequality 

I^Co-m.r ) - ^(0, r)\^ tt + P™*) V + t r * (3«2.Ut) 

An induetive argusnent*, using (3*2.13), shsws that 

j^Os. t) - ▼ i (»,^)| £ (pF 1 ♦ p 11 ) v ♦ TTT R * <3-2*i5) 

A alalia^ argument la the ease a < ■ yields (3*2*12)* Q.E.D. 

fheayem ^.2.^ There exists a unique set of bounded functions 
J v.{ f)> tahieh satisfies the set of egsat&ons 

i»i t m t H 

££&&£• Theoraa 3«2*£ established the existence of at least one set 
of functions foOf)} tfbieh satisfy (3*2*16). hsma 3*2.1 implies that 
this set of iterations is bounded? 

v l (t) | ° n^lV^^I^ U0 • lfi * ■ (3-2-W 

To establish uniquenessa assume that there exist two sets of bounded 
ftmet&ons* (▼ i (^)} and faCt^ • tMeh satisfy (3«£**6). For a&y 
i and an arbitrary ^e *£ , let 

▼ t (t) B s£(v t ~ 9 f) 

Ib«a, arguing as In the proof of Theorem 3*2.2, 

s*(v,<* t *-) . s*(w s *» f t) ^ ▼ (HO • w t (t) £ sgW»,t) - SfCw^.f) 

<3.*.18) 
Letting k* index the maximum of 3®(v,*> , t) - s£(w*<*> ,^ ) 



sJC^f^s^)- ^(w*^tt)| » 



v 1 <^>-«j L (+) £ 3 E jjftt) 






Y^»-V^>» 






Staee 7. (4^ ) and w« ( 4") a?e both be&ndad functions of *f P there exists a 
mober* H> 0* euoh that 

I^Ct) -w(t)| ^M # i«l, .... K (3,2.20) 

Repeated application of (3*2.19) thro yloMs the inequality 

I v 4 ( t) ~ t? Cf )| < pV n-0 f i,2 t ... (3.2.21) 

Since £'£< 1, it then foiled that 

v.(^) »w (f )• lol» ..., » (3.2*22) 

* * +e 3> 

Q.E.D. 

ffiffiftffffl ,3>2iA ^ {% ( H')? is the unique betted set of functions 
*Moh satisfy equation (3*1*5) end If 8* (NO is a continuous ftot&on 
of t (kal. • •••. Kj.1 io^l* •♦•» W) f then vA^r) Is a continuous function 
of t (i»i # .... H). 

£EBq£. Consider the Amotions v«(n 9 4^) defined ty equations (3*2*3.) 
and (3.2.2) ♦ Choosing a set of tewdmX functions {\(t )} each aeratoer 
of sjaieh is continuous en $ » it follows inductively that v^(n»^) is 
continuous (2°!* • ••» N$ n»0« 1* 2$ ... )♦ By Theorem 3*2.2 
t'tCati' )j -^ v i^^ umfbEaly and, therefore.. \W) !• continuous. 
Q.E.D. 

3*3 flp J lliMi fl O J& T^nifflQ1ilfTffft va ift TOTrt T HlfclM - 

The functions vAvkff) deftosd fcy equations (3*2.1) and (3.2.2) can 



be U3©:1 as suooess&ve appro3&B»ti©ae In the nurasrioal oalssiXatiGn of 
^(f ) at some f&zsd generalised states <i»t')« In this section we 
derive ooRdltioriQ under tatAoh the sequence of Amotions £v (n f 1")t 
converges naonotGniealSy asad find a bound for the esro? of tla© nth 
appros&aant e (n, t) « v^C^) - VjOatt')* Him section concludes tilth a 
proof that the opt&B&X sampling strategy of the n»step psoblec* defined 
bgr (3.2.1) and (3.2.2) converges to an optimal saspaing strategy £bs» 
the infinite horissn psoblea defined %qt (3.1.5) • 

theftpwa 1-<*.1 Let the terminal functions of equation (3.2.2) b@ 
constants^ 

V 4 (t) ■ V.. i«l, ..«, K (3.3*1) 

1 1 te £ 



Let 



and 



,' » *" \ ft V* - "» \ ft (3.3.2) 



r 



If 



its HA R -iX< UjV °- 3 - ,) 

V* . pv* < r C3.3.&) 

thee, for i«i, ...» W» the functions v-Cn*^) defined h? equations 
(3.2.1) and (3.?.?) fona a tsoratene increasing sequence which converges 
uniflemly to ^(S"), the unique bounded solution of equation (3.i.5). 
Similarly, if 

v* - £V k. R # (3.3-5) 

then the functions v^(n 9 f) form a sonofone decreasing sequence s*ieh 
converges uniformly to v.(»f'). 

Praat . We will first show inductively that, if (3. 3**0 holds, the 
sequences $ v.(n,f A are monotone increasing for each i»l, 



•» ♦••# 



-52- 
and each J^e 2: • Uniifora convergence of ^ v (n,^)^ to v^(^) has 
already been demonstrated* If equation (3.3.**) is satisfied, then 

£-r * £v* - V*£0* i«i, ..., N (3*3.6) 

Assume that v 4 (n, t) - ▼«(»*» + ) £ fop i»i t *•*, N and ^e ¥ . 
if 

^(n* i,t> * S^(v t n,t) 



and 



V^t) * s£ (v f n-i,t) f 



the inductive hgrpothesis implies that 

v % (t*l 9 +) - v 1 (n, *) ■ sj(v f n,t) • s£(v, n-l f f ) 

isj(v.n^) -.aj(v, n-ift) 

. (3 £ p£ (+) [^(n. T* 3 (t» - ^ (n-i, T^(f ))] 

£0, M, .*., K (3.3.7) 

proving the indention* If (3*3*5) holds, 

^dt +) - ^(0, t ) < R ♦ PV* - v% . (3.3*8) 

i»i, • •*, H 

That v.(n*i f M*) £ v^(a,'+') is then easily established by induction in 
a Banner similar to that display©?! in equation (3.3.7). Q.8.D. 



Let the error of the nth approxiraant be defined as 

e^n,*) s Vi (t ) - ^(n.f). i»l, .*.., K (3.3.95 

4"e £ 



-5> 

If fv (a»«f)^ is a sequence of functions tMoh converges taenotonioaiay 
to v.Cf ) then f®*^ ^)| Is a sequence of functions srtiieh converges 
oxnotonioalSy to sext» e In this ease*, if e> G is an errer»»b©und uhioh 
is acceptable to the dec&aioii-adfcer sad if tern is the groggiest positive 
integer such that 

than v^(n%^) is an acceptable approximation to v-(*p) and the sesrM^r; 
strategy resulting in v. (n » f ) is an acceptable approsdtajatioa ta the 
first n levels of an optical sampling strategy* 

It is not necessary to require that the successive appresi»ants 
v.(n f f) converge esonotonioally to v.( *f / ) assd*. in fact* a none»iaoa©teni© 
sequence Jv* (21^^)5 say converge sere rapicBy than a monotonia sequeeiee. 
fhsorsE 3*3*3 provides a bound for e^Cn**^) assuming nothing about 
raonotonielty* A Xaaraa is first required* 



S&aZ Let r and H be defined fey equation (3*3*3). Then 
*I ( (S')» *b» solution of (3*i»5)» kas the bounds 

l5? *▼*<*>£ A. ilsi * <3«3« 

£SQg£* The mean regard per transition under say policy has the 
bounds 

* ^ ^(t ) £ R. fc**» •♦♦t ^ (3«3.i2) 

i E *t •••» i» 

Since the expected disosunted reward over an infinite period under any 
strategy is the sua of the expected rewards at each transition* the 
regard of the nth transition being tuaigbted fey p n » the isaxtasa total 
regard over all strategies has the bounds 



.#*» 



p 2 $ n z v Af)* n Z fp, 1*1, ..♦, h (3.3.13) 

ipO * art) H"« £ 

&?oa vhleh (3*3-13.) follows* Q.E.D. 

■flfoffilBflBI VV* ^et f ^«(^»t) $ b© a seqsenos of sG<3©@ssiv© 
appros&iaations defined tgr equations (3.2,1) and (3.2.2)* *&th essastet 
tera&ml yesard ftanotlonse 

V,(^) « ?.. ial, ..♦, N (3»3.W 

4) fit 

Lot v * If , p f end R be defined tjy (3*3*2) sad (3*3*3)* tnen tho arso? 
of th® nth approstani has the beond 

e^n, t )| * p n Caas £j*j - *\ V* • -gg\ ]. (3.3-15) 

iP% 9 .**, K 
*fG§ 1» 2, ... 

0*P<1 

Baoof . The psoof Is ind&et&vo* Fop ep0 9 

e t (O t) s ^(t) - V 1# 

Suppose *a(*K) ^ \* S^ 611 * *V ^eaaa 3*3*2» 

k<0,r>| B v (f ) - v. £ A* v*. *M, .... B (3.3.16) 
2f* on the ©the? tod, n^Ct) < \* &wm 3.3.? la^U®s 

[^(O, )| « V 4 • ^Cf )£ V* • «^ . i*4 # «.♦, t? (3*3.1?) 
Therefce., In either oase, 

le^O* )| * sax }3§?~ **** v * - t%T * lcd » •••• H (3*3.13) 



Zt Is to be noted that at least one of the to terras* r**? <• 7 and 

e 
V * «K~ t is ron»nsgat&ve» ifes?» if not* vm have the oontradletAon 



»5> 



Saving established that equation (3*3«&5) Is valid fo? hpQ # aasaa® it 
holds for n. Let 

,6- 



^(i&lt *) » S®<* f n, f ). 



Then 



s£<v»°°,t) » S^Cv.n,*) < ^(ntl,*) £ 8*(v 9 <* f f) - S^ws*}. (3.3* 

Let Is* indess th© taastesa of I S^p***^) - S®(v»n»*P) and 
|4^»^»^ * s|(v t n,*)| . Thm 



Q % {n*l 9 t) 



M 



I (v t oo t t) • s£(y,ii,f) 



Q»2»D* 



£ p jL <! (V) Iv-V™ 



< p** 1 



{ 



w 



V - 



(3.3-SO) 



Corollary .'Mull &©t th© ten&nal fonetions V («f) be oansta&ia, 

i 



Then» if 



y r> • y 



* 9 



4-c J- 



H (3.3.21) 



(3.3.28) 



th© effpo? of th© nth appvoadnont baa th© bounds 



£ e^n, t> £ ^C tSq - v*). 



0^1 



(3.3*23) 



SbdAarly, if 



vT • p\T i a, (3.3*^) 



-,.J» 



e.(n» 40 has $2© boisads 



( lfir - V ) £ ^(a, *) £ 0. i»l» ..., M (3.3*25) 



1 



tpo t i#2»»«* 



Ppay>f. If eqoaUoo (3.3*22) Is satisfied Tfc©o?«a 3.3.1 implies 
e i (a» t ) £. 0. rfcreover* (3.3.22) iapHea that v*(lcp) £ R and V*(JUp) * r» 
hene© e tfcat ^ -v i. V* • i£p Sqqafcton (3.3*15) t8*ara SteHs the 
upper lEssquall^r of (3*3*23). Sissilarly* If equation (3*3*3t) ia 
satisfied, then V (i«p) a. r acd v (1*0) ^ R, hene© V •» -£- £ A • v * 
Th© bounds of (3.3*25) then £bHow fVoo Theorem 3*3«i snd sqa&fcion 
(3.3*15). Q.B.D. 

Lot the?© correspond to th© nth approsiisant v.(n # i^), defined fcy 
(3.2.1) and (3.2.2), th© n»st©p ©ptitaaX sampling strategy d (n). At 
least on© such optisaal strategy exists sinee there as 5 © a Unit© nuniber 
of different sampling strategies fbr th® n»3t®p problem; there may be 
so?© than on© n»step ©ptist&l strategy. Th© nest theors® demonstrates 
that* as n-^oo » any n»st©p optical strategy converges to an ©ptlsal 
sampling strategy for th© adaptive control mdel of equation (3*1* 5)* 
W© oust first precisely define ^hat is meant fey coavergGoc© of a 
oaspUng strategy. 

Let th© generalised state (i*^) be flssad. To every n^step 
sampling strategy d(n) there corresponds an ordered pair *"*;*y )» «her© 
k g \1 9 ..., K^ and y e[0»13 is defined fcgr equation (3.1.13) $&th 

6 ■ for a > £ H • Let d* « (k% y») b© a sampling stratagy for 

k»l 

lia 
the infinite toriaon nodel of (3*1.5). Then ^© aay d(n) «* d» 

n-*°° 



-5?" 
if, gives e > 9 thsre is a positiw Integer v sueh that* lbs* all n > \», 
kjj » k» and jy n - y e j < e. H&s definition iranlles that* gl^aa ok 
art&trarilsr large positive integer /<. § there es&ats an integer v sooh 
that* for all n > v» the vatoes of the decision ftetions on t$&e first 
/* levels of d(n) are equal to the values of the decision ftanei&ons on 
the first /< levels of d*. 

yMMM^pa 3a2a£ Let ^h® generalised stato <i*t ) be fixed and let 

A <i D. denote the set of optiraal sampling strategies for the adaptive 

e 
control problem of equation (3.1.5). If d (n) is an n*step optimal 

sampling strategy for the problem defined bgr (3*2.1) and 0.2,§) 9 than 



ealsts end de A • 



^o *%> ■ <* (3.3.26) 



ffftyrt fr Let k denote the initial alternative seleoted in the 
n-step optiaal ssspSing strategy d(n) and let )L be the set of 
alternatives in state i t&tsti are initial selections for an ©pt&aal 
strategy in A . We first show that ^^ k^ » kefc^* 

Using the notation of equations (3*2.3) and (3*2.&)* 

v^Ca,*) ■ s^Hv, orf.t.). (3.3*2?) 

and, Sbr any IceX^ and a^ 

v t (t) » s£(*t<» 9 t)> s£(v**>,f )• (3.3*28) 

Asssae that ^ k^l does not ©envorge to a aesbsr of )L » Itei there 
exists a stafoseq&snee \\\ sooh that 

Let e be chosen stub that 



«5&» 
Ste*» ^ioo \to^t) ■ v 4 Cf )• «• tewe* *y (3.3.27) 9 

v 4 (t ) • S^> (▼, v»t f *) | <f (3»3*3i) 

for all s» 3Qfae4<8Rta^r large. Bat* casing (3*2.3) end (3.2.*»)t 

J*J s£(v, »»i t t) © s£fcv*f + > to*, ..♦, K & (3.3»32> 

and tfear© ©aists an integer \» ©u©b tkat 

s^>(v, n^ftf*) - S^(v t «*>,f ) 
thaa» aoEjfcSiitrag (3«3*3&) &n& C3*3«33), thss 5 © exists an integer v saofo 
that K 4 \ and 

Ha 
n-*<*> 



(3.3.33) 



<€* 



(3.3.3W 



©ontradioting (3.3.30). It follows tbat JJjJ* k tt » & exists and that 

Given a positive integer /* « tli© sara© proof ©an b© sppSiesl t© ©ash 
selection of an alternative in th© first /* levels of th© saving 
strategy d*(n). Sin©©, having fissad yw 9 tSaer© as?© a fb&t© oanfce? of 
suoh alfceroativ®s ft thsra es&sts a positive integer v soeh that for all 
n v » the dec&stes toetlons in th© first /* levels of d (n) base th© 
aasa values as the oorresponding decision Baaaet&ons in sosse strategy 
deA • Q.E.D. 



3.** flgflmto 

Equations (3*2»£) and (3*2.2) are igrp&oal of a ©loss of jpegursivo 
equations tAdon appear tteoogtxmt this report. St is to b© noted that* 
al&otsgh these equations readable a ©lasaioaX iterative insula for 
sueeessiv© approsteitions, v.Cna^P) is eoapated, not in tertas of 
v*(i>4, HO, but in teras ©f v.(n»i» ^(*P)>* Coapatat&en of v^n,*) 
£br a speoigl© valsj© of (i# t ) involves th© evaluation of betosn 



-59- 



GtjBr ' and (kgti) teralnal wiiaaa V,( f«), whore fcj o "f*^ k^ 

anJk S° TKV 

One u^ to oocyte v* (n* ^ ) &a to start £& evaluating andl storing all. 

repaired values of V (+") s tr.(O t H') f then to eoepste and store all 
required values of ▼«(!» f )» casing the results of the previous oosspitsUea 
of vJ,Q»f)* Xn general* £br v « i, 2 ft •••* n-i» v.(v»t) is eospated 
in tesiss of a grid of values of v«(\ft»l» f) and is stored fbr use at ths- 
nest stag© in eorapatAng v^C^If f")* 

Sine© the number of terminal values V.< ^) tibtoh are needed graws 
eigsenentiajQ^ yitb n» it is dear that cons&deraKle storage e&pssii^ is 
required* For even laoderataV large n» tape or diss storage Exist be 
used* Moreover*, a fairly eoaplfla indexing routine oust be programmed in 
order to utilise core raaaory efSU&enily* 

an alternative appstsaeh is to evaluate ▼*(&» *f) 2>eouraive2sr* Using 
this zaothod* oomputatien starts with the nth level rather than the aero-* 
level. Jn gseeral* the routine* at the (v * i)th level of eosspxtst&on*, 
starts to evaluate v^stti, 4") for sotse pair (jj>^)« Shis level of 
eccpatat&on is suspended tahon a valsse at the vth levels v.(^ «f ) f Is 
required. Certain key portions of the (\»l)th level of ©aoputation are 
stored on a txisb»d&sn list and the routine than calls itself » entering t 
vth level of ooeiputatida to evaluate v*(v«, f •)• Raeowioo is halted at 
the B93R>»th level *jbaa VaC'f ) is ooapsted. The results of loraer level 
oonpiutatiLons are then fed toack s in sueoossien* to higher levels* Ffeving--; 
obtained the value of vAv* tO in this nanner* the (\*i)th level of 
aeopatat&on reeXa&Bs its partially eospleted oalsalstlons from the pu 
Com list and eospletee them. Ibis soooession of evsnts continues tsitil 
v^Cn* f ) is evaluated* 



•60- 

Tho efas&Qge r&$&&88m$8 So* vomfre&v® oaXealat&on of ▼. («t +) oens&st 
only of the spaa© noe&stS £bp steag© of intorsaediat© ©saltations on the 
pos&ftaJ list and* ttesiosp©* incs^saoe HneaySy tdtfc n» Ita* $&® 
r^caroiv© method has th© ©&rantGg© of yoqpls&rjg eonsidsBpa , ba# loss sto?ag® 
ttai the &?st sothsd dsseribed. Sine© sposlfi© valaas of <3\(v, ^)» fo? 
v o o» i 9 »••» n»i» oay to© to bo rooalonlated juany tiaes in the 
jposurstai osthodt «o as*© esseiatiaHvV isa^ng planning tia© £0? stojago. XI 
9hcro2cl be noted* hc^3^^ 9 that if th© fir-si toothed Foqairos tap© tadiang, 
the roooysive method raa^ ffosStto© overall sras&ng tia©« 

Th© general theosy of s-ooorsi^© ©OTpatation is deoegilbedi by 2*k3Carti&^ 
[29]* Ftoogsaraaing lengaages of th© AXfiOL SaslUy £3S] a?© ©spafel© of 
jpooura&v© ooenpztat&ono &s ere raost list processing languages* It is 
possible to clo speoas'si?© p?og»eEza&ng in EDRTRAKII M. Th© peeayai's© 
pflogsaas t&ioh wbjp© sE&fctea £br tfc&o poport nsed the MAD language [3]« 

0tU4sing tha r©e!2i , sa.^o method* & pBOgfesa «as written to eraaloato 
equations (3«2»&) end (3»2»g) fop spoolfio paiys (X 9 f) tafeen ff !aas tb© 
laatsds beta dists&bat&on. Ms psopaa is contained in AppenStos B* 
Son© noBosioal results obtained fteea the psograra are presented in tha 
nest section* 

3*5 Bsaae&&a2i &m&&* 

OsasiGteF a tKO*»atat© Markov chain s&th tsso alternatives in mtik 
stats. Xiot tfte re^an^d aatris bo 

$a go tf | ^ | ( j 1 « 1. (3.5.i) 



io 


k 


k 


6 


s& 


3 


8 


i6 



? ffl ft I « 2 

Assqbo that th© peioap distribution of ? is a sats&s beta distsibotien 



istth passaete? 



QBUt SftBQTl 



°m 



13 



V 



0.0?&i 

i.oo?8 

0.5586 
0.1886 



0.66? 

0.?S0 

0.1S5 
0*625 



0.03P0 
0.3360 

3*9099 

0.U32 



0.333 

0.25) 
0.8?S 
0.375 



(3.5*2) 



(3.5-3) 



Letting 



and 



g " <Pti» Pjfct %» P 4 gt Pg** %» P gi » %) 



tha ^as^ano@«c& , ??a?lQnQ8 mts&s of th&s diste&bation is 

BCff-S)V-»3« 



(3.5^) 



o.aoo 
**o.aoo 


-o.aoo 

0.330 




o 




0.080 
-0.080 


-0.080 

0.080 






o 


o.oas) 
~o.oa> 


0.00) 

0#180 
<»0.id0 



-0.: 
0.180; 



Let t&© i&eosmt 3&ot©?? b@ £ «a 0.2. 
'EatOe 3.3.I Hsta vaOxnos 02? 
gfoo%) Q 

ua&xjg the fcesB&oal foaotiono 

? & (^) » 0.000. 




&B&, S 8 



(3.5.5) 



(3.5.6) 



-fe- 



ll 



£fat%> 



0*000 
0.000 



8.002 
10.999 



10.04? 
13.U2 



iO.^iS 
13.562 



i0.iJ99 



10.51k 



10.51? 
13.66? 



.£%) 



2 



1 
2 



1 



1 

2 



1 



i 
2 



fi(n*tp 



10.51? 
13.66? 



?.515 

2.668 



0.^5 
0.555 



0.099 
0.105 



0.018 
0.021 



0.003 
0.003 



0.000 

0.000 



A (a) 

20.000 

tfr.OQG 



0.160 



0.032 



0.006 



0.001 



(5 3 0.2 

ConputatiGn times 5 s&nuios. 



V(&) s 0.000 
0.000 



Tabic 1.S.1 



~6> 



n 







S(n»^) 



3,750 

3.750 



8.752 
11.7*9 



10 .192 
13.262 



10.W 
13.592 



10.505 
13.652 



10.515 

13.665 



10.517 
13.667 



8 



1 

S 



8 



1 

2 



1 

8 



2 



« 0.2 



£<n,^} 



6.76? 
9.917 



1.765 
1.918 



0.3^5 

O.*J05 



0.069 

0.075 



0.01? 
0.015 



0.002 
0.002 



0.000 



A(n) 



n& 



16.250 



3.250 



O.650 



0.130 



0.026 



0.005 



0.001 



) *a 3*?50 
3.750 



■ mi « ' »» 



MM 



-#*- 



5 

6 



£(n c W 



10.253 
13.278 



10.25^ 
13.276 



ao.w 

13.586 



10.307 

13*652 



10.5S6 
13.665 



10.517 

13.66? 



13.663 



P o 0.2 

Oteputatta t&sios 5 



iiliwiiii m ill 



1 

2 



1 

2 



1 

2 



2 



1 
2 



i 
2 



3(a»tj ) 


A (a) 


0.265 

0.390 


1 

9.?*? 


0.26* 
0.392 


1.909 


0.#$8 

0.082 


0.390 


0.011 
0.016 


0.0?8 


0.002 
0.003 


0.016 


0.001 
0.001 


0.003 


0.000 
0.000 


0.001 



JKW 



» 10.253 
13.2?B 



Tabic ^.^ 



Sine© pe3>0j the oanvercsne© is fscmten© inospeaslng* Xt is seen that 
ooa^spgtssioo to two dodoal places has ooosasred Ty the s&sfth iteration 
1h® optimal 4&4t&"l poliegr veeto* 



2-*Cn) o 



en 



(3.5.7) 



is recorded In the thlsd eo3aon» sahere °* (a) la the initial dee&s&o&s 

k_* vbs» the eysteci stoats fte state 4 and the n?»st©p optimal aaiaE&tag 

stjpateg? is cl(n) » (fe^, d(n)). 

Using x(6#1() as the 34s&ting wetes» ?$$L 3$ the ©a^r wotog* of 

the nth iterate* &(n»|j)» ^»s osspated as 

&<a*$> » ri0»5i?l - s(n»*f) (3.3.3) 

[i3.66?J 

and is disp^a^ed in the fbarfch ©olaan of Table 3»5-i* "The last €»3xw$ of 

the table contains the error bound 

4(n) ■pPCbk^J- **• /• $^3 

defined bgr eqt-atSLon ( 3*3»&5) • ^^© *&gA #*® bosm&fr A{a}» aeerajpateV 
ppa3iet3««4n fb&o «38ES3plep»»th© ncE3ibep of iterations pegtslired fb* te~g&ae& 
acoKsrae^.. 

She eoEpitations aboian in ISable 3.5*2 are sirail&s' to ttose of SatS.3 
3*5«i* os&opt that the torslfial teot&ons are 

V t ^f> o 3^ « 3.75. lrt,2 (3.5*9) 

Iho c©nve?ganee is st&H raonstone iso?eesiBg and the vstot ffcnettao* 
e^(n 9 9g)# are recused to appgos&raatoly 2/3 of the oorrespenc&ns voltes in 
Table 3»5»i« five iterations are reqairedl 4n tfc&s ease fbr tao^pSao® 
aooamc^o 

In Table 3«5«3o the ters&naX functions are the ssastoas eapeotocl 
dlseetrnted rossards sstai the system is operated Indefinite^ tander a s&agta 



-66- 

poHev (in tMs ©ace# t&o poUey C&*2)$ of. Section &#3)» 

113.278] 
Convergence is aoootonlc after the first iteration. The error* vector is 
aigrsif&cantSy re&aoed as compared t&ih corresponding entries In "Msles 
3»5«i ^ 3«>5»2# Fbtig 1 iterations arc necessary to obtain tsa»»pk&©e 
accuracy in this instance. 

3*6 ^n^fffflffffl&ffi dt S33jS^S33SLiv 

When the dUseoont factor is tsnitsr tho criterion of sas&s&sing the 
total expected v&med over an infinite period is no longer t&seg&l s&aee 9 
t&th the possible easeepUco of a set of s&anle histories of asasare ser©» 
tho espeeted reward over on infinite period wader any strategy diverges 
to + «* or »oo # An alternative criterion is to saai&E&ae the expected 
rate at tsbloh tho psoeess earns reeasrds in the steady stato 9 or the 
espeeted &&& of the process. Sit this criterion is not really peedse* 
since the dscieien»aafcer csn» in the adaptive control process* change 
alternatives in any state at any tioe» and it is not certain that a 
steady state tAXL ever be reached. Moreover* among those str&te^es tMxft 
do load to a steady state and tMoh nssadtsia© the gain* there are an 
arM.tsc£?iV largo ncE&er tihieh ar© virtually eejc&'&alsoV-those strategies 
in •sfedch each alternative is saxaplad a large (bat finite) ntaafosr of tfea 
thes a fisrad policy is obossn imsier als^st perfect inlbraat&on. PtortJ 
remarks on this class of strategies taiH be cade in Section 3.5* 

Since* £b? each £©(0*i)* there is a tfe21>def&ae£ criterion laadi&3 
to an optimal poHoy* an alternative approach to the gn&seotanted process 
is to let p-» i in the discounted adoptive control probta as JferaslBtad 



in ©qaatfcm C3*i.5). Fb* Ited % lefcJECP) » (a":(p)» .»»» cr<p)) be 

1 2 

an optimal Initial p&ko? 9 vto&s® ^CP) la the ffiasdE&alrxs T7a3&© of k In 
aqoe&on (3«i*5) &? a fkead ^ef, Ho shall call J£(3) a BgQD&asl 
H£2&g» If» fb* same 6e(0»l) v 



tse shall oaH £, an ©prt&aaX i&tt&aS. poH«er £bs* the uaaSLsoooated adaptive 
sontsoX psotiLeeu T!» esSLstewse and R&tea of opttaal poHs&aa as 
<Sat£ne3 V C3«6*£> are oattera Sop ftatutre &troesftg&tlosu ElatfealX [ii] 
and OsBC3ai)£&6] haro rased this approach to tagB&aQotsafced des3.aioft p?©btae 
in a Masks? chain ?&tfc altemti'pes i&en the t^anait&on psobata&2i&@s are 
taw© t&ta oertaiatjr. 



CHAPTER h 
QCPSGT3D STE&Dr-ST&TS PROBABILITIES 
km RSUTSD QOAKTOTCSS 

Consider a Markov chain with alternatives which is operated under a 
flscsd policy, i£. I*t ^(JSP denote th«8xN matrix of transition 

s 

probabilities, assumed to have the prior distribution F^(P^). 

In this chapter w© examine some functions of P which are of importance in 

IS 

decision problems, with particular attention devoted to the problem of 
computing the means, variances, and ©©variances of those quantities* 

Section 4.1 deals with the n-step transition probabilities and with 
the ©spec ted discounted reward ovor n transition* The second section is 
concerned with the steady-state probability vector. In Section 4*3 wo 
consider the expected discounted reward over an infinite number of 
transitions when a fixed policy* *£* & 9 uaeA and, in the final section* 
some results concerning the expected reward per transition, or process gain, 
are presented* These quantities are, of course, important on their own 
merits; the results derived here will be applied to various terminal 
control models in the following chapter* 

Throughout this chapter it will be assumed that a specific policy*, 
Z, 9 is in force and that the Markov chain is governed by the N x 19 
stochastic matrix, g» £CZL) a8 ^ ^ a ^ *h® matrix of transition rewards is 
&CSD • ^ • CiV«l« In ^ost cases, the dependence of various functions on 
SL wl31 net be mad© explicit in order to simplify the notation to sea© 
extent* 



-69- 

If ? is a stochastic matrix governing the transitions in a Markov 
chain, then the probability that the systeta is in state j after n transitions, 
given that the system started in state i, is the (i,j)th element of the 

nth power of P, and is denoted pf n . When T? is a random matrix p. ' is 
a random variable* In this section we derive expressions for the expected 

value of p. . and the eovariance of p „ and p^ , and examine a related 

© 
quantity s the expected discounted reward over n transitions* Silver , using 

different methods* has considered the expected value of q: , assuming a 

matrix beta prior distribution for P f and has presented numerical results 

for a tao- state process* 

Theory U.l.l* 2f the prior distribution function of g'is 7C?|'f)e^» 
a family of distributions closed under consecutive sampling, and if 






n<ti,2, ••• 

is the expected value of p , then P; . Cl"") can be computed recursively 

*J ij 

from the following equations s 

*J k*i tt k ^ k 3 8*4,2,.*. 

pi 4 } (t> • p 44 cr> **** » (*.t.a>) 

£saa£» Since* for nsl,2,.*., 

(n*D » (n) . Q 

p » £ p p » Pe^ / 

lemma 2*3*2 yields 
1 C3B] e pp. 88-8?. 



-70* 



$phr>.£j& ^mtft) 



« /• «fji 



■ 1 i* >< V +)) V +> - c " a * 3) 

(a) 
Sine© g£ / is a QontinuQas Amotion of P for i 9 j»i 9 •••» K and 

n«i 9 2 9 ..., the integrals in C^.i.l) and (<t«i«3) exist. Q.S.D. 

Thaopga U-.1.2 Zf the prie? aists&bation functions of P is Hp\^}&$ 9 
ft ffetsily of distributions continuous in *¥ » then fop i 9 je»i 9 •••• K and 
b»£ 9 2 9 «». 9 p.. (HO is a continuous function of r • 

P£gg£« The theorem follows directly from Theorem £.&*3» Q.E.D, 

Sbmam £*1*3 If If has the distribution function F(j?|t )e'9 l 9 
ft faeA3y of distributions cSosod under consecutive sasipling f and if 

^N a 9 p 9 y 9 6 8 1| «>t| B 

a 9 v b i f 2 9 ... 

then, top n > i 9 \» > i f 



* s ofir 1 ^ i w*' ))i » (w - 5 > 



while* for S¥*t or %*4 9 



s oy$r \ 1 1 - v +) ** ( W t}) (M - 6) 



-7*- 

integrals {^»1»^> eodst* Applying Loans. 2»3«2 fe&e$» we obtain* £»* the 
oseo a > 1* v > l t 

B V )5 S )< V t)) 

and sAj&l&rfy $>* v s i» Q*£«D« 

Let us nssr «3S5st<S©p ©1 O*^)* £&© p£^&9F ospoeted disostmtadi reward 
£n a transitions i&ea the aystea starts In state & cod F(P (t) la the 
raftyaimX prl©? ^istrtbation Itetfcoa of / P* fids eopsetatloa trtUl bo 

Si 

tn) 

reqals'ed Sbs> en© ©f the ten&mX eoatral rasdols of Qagrtar 5* &&£ <L (£»?) 

bo the eor ro sp on ding expeeted disemmted reesrd given that P « ?• 



*fri T ^ If the prio? distribution SaaeSlon of P is F(g\t)s#? 
a faaily of dlats&batieas elosod tmd©s» osnssoative s&JBp3ing 9 then ?£ B nP*^ ) 
esu be eoapntsd roeure&vaay fre© the £b21e»s&sg eejaafienss 

lot* ..., B 
flP&*S3*««« 



q5 l) 0»r) * £ |S (f) r i»i, •♦., ft (4.1.9b) 

0*£*i 

where £ » Cr« 4 1 is the reoasd naatffix* 

(n) 

££«&£. K>r np£» 2 f ... end all ge^t <L (9»P) satisfies the 

following renewal equation, 

qj^C^g)* 2 5)^ C* tt * P ^ (P*P)1. 1*4* ...» H 

Then v using LoBaa 2.3*2 9 

tagfaioh is (4.i«.9&)» Sine© qJ^Otf ) is \(t> as denned Igr equation 
(3»i*&)t equation (**.i.9b) Sblleras* Q.S.D. 

5b? the ease fJ =» i another taethod la avails&Le £6? e^aluatiiTg 
5£ (it f}« In a sasaple of n observations let f*. be the nusbsr of 
t^ansitioEs observed £tora state i to state 3 and let F « [f. 4 ] an 
K 35 H raatslJtt be the transition count of tb& sample. Prior to the 
observation of the sasnl© F is a rante satria an&t given the initial 
state l t the nuiaber of transitions n» and the prior distribution of %, 
«© can find the dists&bufion of Ft uasonditional t&th regasd to i£ Lot 
F o t\»l be the asm of this unconditional saspling dists&batlona then 

<L <i#t>- £ £ f^r-. (M.U) 

^ joi leal ** ^ 

If the prior distribution of g is the natsis beta distE&feutiont than the 
distribution of g f Uttoanditionel with regard to gt is the bats*- Uhittlo 



«73* 
distribution* «$&e& is discussed in Season 6*5* 

Lot P be an argodie stochastic aatrixo Thm thes?© is associated tilth 
P a unique vector of steacfc-state probabilities* 7f(P) © {in 9 «*«» 7C)» 
* h * re ""i ° ^iHP * 9 * he 3tead37*state probability that th© patera is in 
stats i(i a 1 9 • ••» N). the vector JET aatisfS.es th® following e^stsn 
of sitauXt&nsctas equations* 

2T» 2^ <*»2.ia) 



R 

2 TC » 1. C^2.ib) 

ital * 

/*• 

If P is a Fsadbs oatrix tsitfe an astitraiy distribution funeti©^ 

F'CPl'f )ff iMdi satisfies a E&id continuity condition* we show bekft? that 
the subset of non-epgodle matrices in ^ is a set of smessre aero. Thus* 
it is &ss3.'.Sngful to speak of the random vector j£. 

We are cMofly concerned* in this section* tdth the expected ?s&ue* 



JLW ■ (7fj(r)t •••■ TT^Ct))* of jr. It is sboan that this expectation 
exists and that Jjjj^ p^(t) ° ^.C^)* We than assume that FCP\«f )«^» 
a fondly of distributions closed tinder the consecutive sesapltng rulSf. and 
derive a functional equation for 2,(4" )« Kothsds of suoeessive approximation!? 

based on t£&s equation are discussed* then sons nuaerioal results ore 

presented* Vie conclude tilth a dissuasion of the eovarlsnee of ^ c&rl tt * 

i J 

*.s.i ^&ate^a£J^i&»fcsa£i:« 

Let us no» consider conditions on the prior distribation of p'wW.oa 
insure the existence of the general joint noraont of the elosaants of j£> 



-7**- 

„ H „ P N _ 

i»i * ' J i«l * 

•€(4 

where the v^ are nonnegative integers * 

Let 

be the set of all paqfrtfrva fffoq^aflft ftp. matardoaa and, for 0<a<l f define 
the set 

We remark that jrfjf is a closed and bounded* hence* compact 9 subset of 
E? and that jf^cij^ <zj for any a In the open interval (0»i)« 

Tor fixed ac(O.l), let S(») be any subset of Eg suoh that 

n 

jf u - 4, a * S<o) <*.2.5»> 

and 

Tfeas* for all o€(0»i)» the boundary of ja is a proper subset of S(a), 

N 

If© for some as(O.i), there exists a set S(a) satisfying C»*2«5) such 
that F(?|*f)» the prior distribution function of f£ is continuous on 
S(a) f thm P(? | Sf) is said to be fflfl^nmg pjj J&£ faffl^fWT fi£ ^» X* 
^ is a fasdly of distributions indexed by r every aenber of iribich ia 
continuous on the boundary of J3 ^ 9 then v* is also said to be eoatiraous 
on the boundary of JST R . 



Ifee following lesoa shows that continuity of P(P (t ) on the boundary 
of j0£ is a neeeseary and sufficient condition for the set of boundary 
Points* jQ - - ^T K *» to be a set of aeasure aero* If gfjg^i then P 

consists of a single chain with no transient states* Thus* the subset of 



Jl tdhlob inatqdes all partod&o sod igultipXo-cbain transition isateiesss 
as wall as thos© singLs-ehain ts?snsltloa sats&oos ^hloh have trs&s&ont 
states* is contained within V - ^,*« ^ inpori of Leasa &.2*£ &o 
that* provided PCgl^) is oeatissteas on tho basEdasy of J 7 9 is© need 
o&Sy ©onsider transition saMoa* in ,# *• In this ©as© # *&th 
probafc&Hty on©, £ ©slats and, ao?aov©*» ff", > (J»i, ».,, K). 

JjgBEa&fi&i IfF(P\t) i« thtprtopdlstrlbiitloo Amotion of ?i 
tbac a neoeasaxy and sa£fS.eient ©ondition that j# • j/ * ba a sot of 

ssaasare soso ralativ© is the pedor <&sts&bat&©n is that PCPl^O b@ 

©onttocas on the boaatey of -jo • 

B 

flm£* For all ae(0»l), define ta© sets 
CL .(a) » ?P I P«/ M , 0£p^ dT al. i,$si, ..., S (l».2«6) 

A'A'^A ' A° C ^ C U (a) » ° <a<1 ( ^' 7) 

and, fbr all ae(C,l), th© peobefc&HV sseaaaj?© of the sot J \ ~$ has 

ft u 

r N N r 

JdF(P|t>£ £ 2 JdP<P\t). (f»*2.B) 



Than 



th© board 



*.^r a "** ** 4* * 



0«a<& 



Xf P. .Cp If) is tb© margins! dlsfcs&bafcta fonottaa ©^"p , tben 

f«P|t) © F 44 (a| + >. i»3*4* ...» (4.2.9) 

Asanas that, £br fised o«CG,i), there assists a sot S(a) eat&af^n^ (fe«£»5$ 
on tMOh F(P,t) is oon&nocms. Lot e v Q ba given. Sine© FC||t) is 
©anttaaaa on 3(a) w© taay ohoec© on o f goch that 0<< a* -<• a end 



Tbm 



-?6» 



*3 Br 






"* -«M 



ard 9 sinoe e ia as-bitffGEsrft >/ « J is a sot of cxjastas?© sa&o* pg*s*&ng 
sofiBtaienqp* 

<Sh deeensttfate neoessity it safSlees to note t?ftt» if the?© dees 

not eedst a set 3(a) ohlch satisfies the eos&tlane of the 1®sb& 9 'Vhm 

F(P 1 1 ) mast assiga positive psobaMHt^ to at least ana of the bowadasy 

points of j$ . Q,B«D* 
R 

We psiaarte that* in the case of the satpisi beta distribution* the 
esAetones of a densi^ fteetian izapSies that the oosweepending distsibotion 
Amotion Is eonttaous on Ej2» Usespefor©* the tady of naisix beta 
dUMbattena is centimes on to boundary of V,. 

gaaogga fr.,2.2 If the pjio* diatrtlbatten ftaetisn of % F(|(V)» 
is ooot2.nacos on the bonedas^ of „e * thai the ja&sfe raoraents (*>.2»2) 
es&st fcs? all noHnegativ© integer's v 4 « Xf» gfcEpthesraos'eo F(P IM' )e ^* 
a fte&V of dlstsibations oontinooas in ^ t then S f^gj fl^lf 3 is a 
osntinaDtJs ftmoUon of ^ • 

%$££&>« % ^aa &.3«i» 

R f H 

EC JT Tp f TL|t]o JT irA dF(?lt). (4.2.12) 

i^i * JH a ° 

Let B* Ag) be the eofaoto? of the Jth diagonal eleoonts of &e raatsix 

g(p) ■ [P* - 13. (&.2.13) 



11 om bo show that* fbr all Pe^ 



«3 -JS * 



Slaae O^(P) la • aw ef produeis iswoleing elaasnte of P, "*\(j;) 4s a 

mteKS toad9d * a,0,4oa * S °° J u ** ** 4DtaBral ( *' 2 ' 12) a * 8fa " 

ybesj F(P |t }ed*t & f&&2y of <&sta?ibati©n3 oont&anaiaa in ^» eoeitlnBlty 

of St if *« * I f 3 follows fees Iheorsa SMfr.3. Q*S«D. 
i«*i * 

Ifceoraa ^2.2 can be promod ondor the wsok©? condition that the ast 
of nenergodie transition probafc&Ht3r aata&oee in / 1st set of measure 
aare t tut this oriteion is aore <&£ffleaXt to apply in praotioa than tfcai 
of eentinalty on the boundary* there are sony psobtaJSt bowser, in ta&iefo 
it is neoasaary to assign positive probability to ergodic laatrioss on the 
boundary of ^ . 8w? saBaanle. in ssras random w3k aodeXe ^ is tenoar* 
to be a Jaoobt oatri»~that is* ^ « Tdth probability ono if li - £»| > !♦ 
In this case the theory presented here can be applied by assigning a ps^oi? 
dHats&b^dlon to the B s 3 generalised stochastic natris P » ^her© the 



ita row of g consists of tha e&eaents % . - » '^t % *.* • IWLs 

teobniqtie eon bo applied to ei^r orgodio transition probability aatria in 
tftdLoh sons elements are kacran to bo &ero» 



^•2*2 4fiaCB&te2»3ffla» **• now establish tfeafc» if p^ht) is 
tha ussan ^»otap transition probability defined by equation (&•£♦!)» 

Singer C393- TWLs result «ae ej^parently first eHooow^ec* Tagr 
t&hoe in 193^ in bis doctoral thesis (Hoaanien). Gf. Bossnfciatt C3&1* 



~7S» 

«hereiT(f) o C wAn*)* •••» F H (f)) Is the espeeted TO2ae©f£.» The 

enl^ assumption tfcioh Is sade etfxmt F(? (V"}» the pe^o? distribution 

a 

itaef&on of T # is that FtPlH') is ax&ka&ms on the boundary of ^ • In 
» o N 

order to establish (£>»£» 35) it most first bs shosn that* for ancr ftoid 
as(Q»i), ^, sK* « tt^(p) tadfbsraSy in P on ^ a . Ihte is the eontent 
of the foltodng two leases* 

&BE& ft-2'** Rw? goa© £LsS9d ae(0 9 l) let 

bs a function of P on ^n°» ^hen A (£) * 9 continuous on >/„ » 

£SW$- la* e > G be given* If saist bo sfeosm that* *&* any flasd 
Pe^, p those eo&sts a t > suoh that Ll (P) - A (Q)| < e x&ensver 



and l<s£ 



« fo R -p.*>0 (fe.2.17) 

Its ij 



tihere r is tho ss&Lost element of P not equal to n. . (aastss&ng* for 

3® at *j 

the aoraentt that suoh en eXsoont exists) • Choose 6 » ^^ C«» €*/<:]* 
then, for any QeiL a » 2J? IIP • Q|| <«» «© baTO 




'Bate result can be pKorod under tho ttfaker oowlition that the sot of 
nonergciolo oatsioes in / H is a sot of measure aero* using the bounded 
asswepgence theorem of measure theory. She psoof .gtaan here bs&ngs ori- 
sons interesting features of the coatfesgaies of p^r *o """iO? °° -^r 
end does not renairo a knos&edg© of measure theory* 3 



lot 






asanas^ £bp ife© soaent* to be &»ogpty» l&en* tisSjsg (fe*2*S?) and tbe 

eMknitto of n » 
km 

Thaa # tg? C^»2.i8), 

If <L b ^ {%«\ » C^2 # 2l) Impales that (Piy)^** aa*» therefore, 

th6t p 4 . op # ffess8 t 

appose nsw that tbere is no smallest element of P mt equal to pu 4 « 
Hien S^a is sc?% aad a, . a ^ (l»£*i» •••# !?)♦ Ckoosts^ 5 s 3? 



P - Qlj < 6 isspa&es tfoat 

&cd t banco* that 

|A(P)~ 4 (§))<«• (fc.2«2fc) 

prcrotog the lcma« Q.E*D* 

&££&&«&& Jte* ae(0«i) be £lasd» Thee, for i»3 ® 1* •••» K, 

35:3 Pi^ ■ T.<*> (fc.2*£3) 

os&fbjns^ in P on J*. 

&£)&£• Let e > be ^tven. It mast be shorn that there eaists a 



to? all Pe^/J*. ft&s i&22 to don© l# etaatog that the seqaeao® of 
«*&eh gooa to soro tsn&tesaSy csa -» a as n-*** • Deflao tfca functions 



S&noa < ^<£) £ f So* any fis^t 






aisd 



% leans ^.2.3 and eqoatioias (&«S*&9) and (4.2.30)* { d n (|)}^® * 

EKjmtoEioal^ decreasing sapeoss of itasttas tsfc&ofo ap© eanUnsseus or th® 

oc^paot ooft -^ ? „ Co 9 the- seqtasno© Qom®p$&g to the cofsUauoias ftanstLcm 

s««&. ThG®e8bx&* t {^^o* ualftresSy en "^h** ** i8 ® aa&3 ^ oota&JishedP 

that 

t (n) i 

ft ^li 
r^ diooslBg a poslt&ve inisgap v> soofe that 0^ d (P)< e £©s> ail n> v> am3 

|ei>% (4.2.26) ia obtained. Q.B.D. 
R 






&*&n C353» P. 136. 
* Kenans a»3 Sne32 £26], p. 7l< 



Jhaaaaa JkJL&2 Lot F(P If') be the ps&or d&ets&l&tieR fteiettoa of tfce 

a 

racdasa H » K s&xfcasUo nBtsix ^osid lot 

o 












I fe qff P* Let « > be glveo. Sbr a^jr cu(0»i)» 

* J*|«ff - ^ "^ * / N? -^ 

Let CL (a) bo defined tgr eqaa&on (&.2«6) find 2©t F (p 1+) bo the 



K? (r> - ^H 






iis£»@inal disig&bat&oa ftat&on of p • T&an$ a&t&sg tbat p*"' « ti;<P) 
the aeassl iRtsg?al of (**»2«35) hoe the bound 

Ip^- TT.(P)|dr(P|t)i S £ F.(alt). (*.2,36) 

4,-4* 0< a< 1 

S&raoa F(P |t ) is ooattaoas ©a the Eofundasy of V t tte© is an a*«(Q»i) 
end a set s(a*) satiating elation (4»2«5) such that F(P|*) is eant&ti&us 

s 
on S(a«)« In ©qaatfcon (&»2«35) 3bet a < •• be otosen ooob that 

I Ia? - ir.C|)| <»(P|r) < f- «p1»2,... C*.2.3?> 
Harcizg flaed a* otooae a paoittee integer v entoh that 



hi? ■ n® 



<i 



for oil n > v> sad all tej/J*'. flhaa 




p i? • ""3^1 ' **£ i* > * f n> \ <*•*•» 



aod eqaatiaas (£.2.35)» C^*2.37)# aaa <*».2.33) yield 



P^V ) - Tfj(t )| <«♦ » > » a 0W&.3S0 



Q*SJ>. 



Ibeoraa 4.2.5 afeosre that %» oas approstaio f? ( V) fy vFfi +) osteg 
equation (k.i.2). 4 raoszraisjo program rats written to earsy oat ti&a 
appeaa&mt&QB s&en P baa a mtris beta c&etg&bution« Soae ample 
eoapatatiorse of S tt^Kft] are displayed i» Sables &.2.l«^.2»3. We fcaro 
©tejsss at the baso of each fable the parassetor ^ of the psAor distribution 
of ¥ and the s&an* P» of this dlatadbatiozu The sstsdss V(?) tM^i appears 

s ■ an 

belot? the table baa as its (i.j)th alaosHt &a ps&&? variaao© of^,» 
F4Iv©r COT baa oortfeeteed that JL(^) && bo apgroaAsated reaaoaabBy 
**b&1 fcer JL(P)« tba st®e3p»8tate ps©ba£&l&1y ^eotor oorreapoadlBg to 



m 

H^n )* tho oeen of tba p«Lor dlatrtbotloia* IMs appBcs&mat&on is also 
glTOn tdth each table* AH taste mo pergbraed on as XM ?09^ oanpater. 

£i UiHLo ^.2.1 C sabare a 2 s 2 transition nsatj&x la coBS&&xred» it is 
seen that TrJ$£) is obtained t&th tfcree-p&ae® aooara^ for a»5 ^d 
«4th ftve»plaoe aeoinaqy for n * 8. In t£ds instance* p5 , (^s ) eoswergsas 
Qenotonlc&lly to ~rr .($£)• The total ttoo raqoirod to oolite tho eight 
ontrles of 3&blo fc«2«£ taas 0.70 states. 

Xa Ifcfele tt.2*2 9 a 2 s 2 transition raata^s is treated trt&eh has p*£or 
variances -dhloh are larger than ttose of tho catris considered in table 
fc.2.i. in this instance eonrosreenee of 5g!?(3j) to tTjCJjj) is mah slopes? 

See Jjjapendix C for t&e progjao listing* 



-8> 



1 

3 



6 
7 

8 



S 



i 

[: 

i 

B 

[: 
6 



0.93^ 
0.86357 



0.93370 
0.92027 



0.93297 
0.93096 



0.93296 

0.938& 



0.93293 

0.93283 



0.93293 
0.93290 

O.93293 
0.93292 



0.93293 

0.93293 



0.065**6l 
0.t36k3J 

0.066301 
0.07973] 

0.067031 

o.o63oy 

0.06702*1 

O.Q6?5t| 

0.067071 
0.067i7J 

0.067071 

0.06710J 

0.067071 
0.06?0S| 

0.06707) 
0.06707) 



% 



m 



,105 

►313 3 



rr 




<£) « (0.9295^ 0.07046) 



Caspatat^aa line 3 0.70 E&nutas. 



P w 



V(P) 



Ib.93^5^ 0.06^1 

[0.86357 0.i3&>3j 

fb.Q038 0.00381 

[p.OOUd 0.00*»8| 



fltf ito &A& 



-»K 



sVPm 



1 




fb.41625 
[0.355*1 


0.583751 
0.64479J 






2 




[0.53220 
[0.29321 


0.467801 
0.70679] 






3 




[0.43970 
L0.38086 


0.5^30] 
0.61914] 






* 




[b.47832 
[0.3W3 


0.52168] 
O.65257J 






5 




f~0.43084 
[0.39*57 


0.569161 
0.60843] 






6 




fb.45896 
[0.36073 


0.541041 
0.6302?) 






7 




[b.42989 
lp.39759 


0.570111 
0.60241] 






8 




[0.44918 
[0.38179 


0.55382| 
0.61822J 






9 




[0.42895 
[0.4G150 


0.571051 






20 




[0.44333 
[0.38931 


0.556671 
0.61069] 






11 




(0 .42915 

[G.W24 


0.571851 

0.59376J 






12 




Fb.43945 
[o.3£*45 


0.560551 






13 




|0.4©629 


0.5725l| 
0.59371J 






<ttl » fo.251 0* 
|p.6i6 1.1 


3521 






|o.41625 
lp.35521 


0.58375] 
0.6W9J 


;(g) = (0.37830 


0*62170) 


v<|)* 


fb.1516 
[0.0337 


0.1516] 
0.0837J 


stapoiatloa Time; 5 


•75 EJteatea. 












ttttftlbSiS 







•85- 



& 



8 



0.60196 

0.1975& 
0.1925? 

0.1*5155 
0.2705k 

0.26785 

fo.388^1 

0.30102 

[0.29958 

(5.35970 
0.31W 
[0.31&10 

(0.35155 
0.31896 
[©.31602 



BCf|^] 



0.2122& 

0.55168 
0.32211 

0.29867 

0.3631^ 

0.33563 
0.39882 

0.37W 

0.35256 

0.38373 
0.3?&>5 

0.35691 
0.37371 
0.3809k 



O.I8560] 

0.25W 

G.$8&92| 

o.sws] 

0.29222 
0.3&«0l| 

0.27596] 

0.30016 

0.32602) 

0.2877^ 
0.3011(0 
0.31185. 

0.2915^ 

0.30733 

O.3OI06J 



9n 



18.265 
2.335 
1.005 



2.102 
5.168 
7.612 




i 



0.75^36 
0.13502 
0.10225 



O.O865O 
0.2925? 
0.77#& 



0.16106^ 
0.57^1 

0.1233y 



7T<P) a. (0.32697 



0.3708*> 0.30219) 



V(P) 



0.007** 
0.0063 
0,0085 



0.0031 

0.0111 
0.0161 



0.00531 

0.0131 

0.0100] 



Cbspatat&asi Tiiaes 5 sdnates Ck* 1, ...» 8) 

3.26 taiwafcgg (n » 9). 



£9£ktib£ft3 



and id not aorotenio# For n e %% p5!?'(7g) and p3r(|y) agree only 4a the 
first dooiaal place, the £3 ents&es of this tafc&a took a total of 5*75 
simiee to ocegxxta. 

3oi93 saopi© ee£pji&t&onQ te a teoe^atat© process o^o sfraan in Ttible 
£».2.3» B3.70 minutes ^bfo required to octapxte the first eight ©nts&as in 
t&laewe. Xto oqntotfc. ot B if lg] *•*»* 3.tf *»*•. Cbr^,^ 

i3 gjlot? and i3 QOt EOmtCff&G. 

Wo rcs)as& that the ceaapatation t&ae of a; C#?) inos'eases ea^onsnti&aSy 

i5 ■* 

i&th n and linearly tAtti £1* 

4.2.3 ay^^^^ .\g^^^^flas* the nra e r l ott oa l eola t t on of 

IT St) i9 a pcob&flB of ©or© dif&oalty* % obtain an esplieit StesaXa 
j 

for ttA t) in terns of t is o^n aore difSioR&t; this g&aeral piob&sa 
has not yet boon solved* Silver C383» assuolng a oatrix beta d&stslbu&on 
for |£ has calculated ^A%) ®&? wiouo pararaet©rs 9 fy t rasing xfcete 
Carlo tee'isAqoes* He has also sho&a that* fbsp a tuo»stato chain «ith 
one rot? of P teaosn t&ta ©eriainisr and a beta c&stsibBtion on the other 
ro^t ^ expected -Talae of tt* is a Gaussian hypergasiafltfiie teet&on. 
Xfels rosclt is generalised in Section G«5» ashore a series ©sponsion of 
WAty) te obtained taken the % x 2 random a&tslsc p' has the aataix bota 
dSLsts&bal&oa «&th parasster ^ • 

One method of ooxncutlng IrAt) is to us© the ergsdle theorem of 

is pEwidod in to© next tneoren* 



ilfe&S Xf |fhas the aistrt.bat&on function F{| \t)c ^ t 
*a*e ^ is closed under consecutive sss^ing and is oonttmsas on ^e 



-87- 
boundary of J 7 ^ then the eapeaisticsEaa TT .(^) sta&taaeously satisfy 
the functional equations 

N 

together sdth 

S 7T (f > © i. fej <fc.2.40b) 

Pyy^^' The condition <^«S«ijOb} is necessary to insure a unique 
solution to (4.2»40a) since* if S^) satisfies (4.2.40a) , tSlC'H also 
satisfies (&.2.*i0a> for all real numbers ©• Efeoea taith this e&3itional 
constraint ws have been unable to prove that IS ££lf 3 is the uniqae 
solution to (4.2.*K)) f although we conjestar© that this is true* 

gEg&£» For J»i# •••• H and te2» using (lf.2.ia) and Lesssa 2.5-2, 

R 






which is (fe.2.40e). Scasing (^.2.33) over J yields (t}.2.<K>b). The 
integrals involved ej&st fcy virtue of Leogaa ^.2.1 and the continuity of 
"n"«(g> on >/ N . Q.E.D. 

Let the vector Amotion* £<^t) * C V^f ), ♦♦., 7f K (n*f ))♦ 
be defines fcy t&e equations 

— R 

joif, • ••» !J 

»»1»2,3»... 

together «&th a terasiaal function 2.(0,^) s (^(G.f )«, ...» 7r R (0»f}) 



which satisfies the eenditLons 



f e 5: 



N 



i-i * 
The tootisa iLtC +") Is the (k» j)th element of the ee$)©efc@& value of ?' 
tshen St? ) f ) is the p&op &i8ts&batien function of?* 

** ^> ^(n»^) adsts, then this lizdt satisfies equation <&.2«40a). 
Ai fM ile 06wt> • § r^,*)^^^^****.*!* 
**®e> C(n t ^) need not eqaal unity, ffewece?, if J*®, SLfat 4 *') eaists 
*&•» ^ 1 c< > 5,(j».t)/cCB 8 v f') eadsts and satisfies elation (**.2.**0a) and 
(i*.2 # £»0b}» Neoessasy conditions fop the convergence of !£,(»• ^) have net 
yet been £bund; a saffkaiont condition is given ty the following thsereB. 

fla^m <U2£ WPto.fl»pto dlatrttatton (tertlM P<|l+>e # , 
ft f&taily of distributions closed under consecutive sampling which is 
continuous on the boundary of **$* &et 5Lfa» f') bo defined bgr equation 
(^•2«<»£) with the constant terminal functions 

^(0,^) » p.* i»i, .... B (**.2.## 

where p ■ (p^» ...» Og) is a stochastic vector. Shen» fer i»l» ...» K» 
n^^fo-t} eaieta and is equal to B Ctt^I*]. Heraover f 

B 

£ JSL^iOatt) ■ 1. ^e^ <fc.2.*>5> 

let °"*" * 

^as> ^^(n,^) satisfies equation (£t-.2.40)« 

JE&aeg. the theorscs is proved fey showing that 

Tf.Cn.t) • § ^ vfcht), J* t ...» H <*.2.<l6> 

J is! * *3 Brt.2,3,... 



ftraa t*lch it fo£tas» ty tneepea fc«2.5» that 

a ECttJI']. #»*» •••• H (4.2.4?) 

Sqpatton C4.2.d6) is established indasti^aly. Fop net, !#P<* ) ■» %jCt) 
and (4.2.*$6) boMs* Asmaso it is tana* £©* n. 1hen» casing eqoat&on <4«& 

TTj<hH f t) « 2 Tr^nt T^C^)) ^j ft) 

• " %plf 1} (f)* (4.2.!*8) 

» 

preying the assertion* Suaelng (4.2.46) over j« ws have £ tt (n,^) » 1 

Jet 3 
(npi 9 2p»..§ ^e $)# sines 

J"i ^ JjBi ^ " 

J? i*i f .... B 

^" n»l,2,... 

Q.S.D. 

E&- letting p A » 8-^ £o* ©oca© fised index k, equation (4.2.&6) 

DSQQBW8 

^V**) • 5[?(t)i 3«*t •••# » (4.2.50) 

and it is seas* that the appsosiiaatien of Tfj('f') fcgr p£} (VO is a apeeial 
©as* of 'Mia aethod of soooasei^a appres&taatioas defined ty (4.2.42) and 
(4.2.*4). 

Anot&er appsosdoation of interest is baaed apen (4.2.42) and the 
ienalnal function SLiQtf) defined bg? 



•90-* 
^<©,y>« tt;<1C^)>» 14, .m, R (&.2.5D 

«here XCgC^)) Is the 8tee4p»state £»ob&&3iV tester earrespeodisg to 

gC ^)t the aaan of the distsitsatien ftanctien F(| \f). We have not been 

able to prove oonvergenee of JL(n 9 ^) la thie oase, bat lifted 

eosgxstatienal esperlenes «&th this apprasimationi, naing a raatris beta 

prior distribution* suggests that oenvefgenee dees occur and* in some 

oases, is ssre rapid than the mzsmgrno® of pf, (*t >• 

ij 

Some ntsasrioal results based on the reaarsive pregrame&ng of (4ȣ.*$&} 
«ith the terminal Amotions (^.2.5D are displayed in tables t»*2»4«4.2*6» 
A matsiss beta prior c&stribotlon wa tssod In all eases* She program is 
giva» in Appendix D. 

A tss state process is considered in Table <fr«2*£»* The tsenaitien 
matrix has the same prior distribution as ms tased to eompat© labia 
fe«2«l, where it was seen that jL(2g) • (0.93293 0.06797). in Table 
&.2.& the approximation 2jCn,^) defined tgr (&«2.&2) is given in oalamn 
te© and the normalising oonetont* C(n,%) o ?^(n f ff ) «> TffgCntfjj) is 
given in ooluari three. In coltm fteur it is seen that 
a^oo c(n,jy) 3L<a»%) « 2L(%)» ^^ three*plae© aocaraap on the 
first itemt&on and four-plaee accuracy on the second iteration. Hi© 
eight entries of Table ^.2.4 required 0.62 minutes of eospttation time 
on an XEK 709^ machine. 

In Table &.2,5 a2i2 transition rsais&z is treated t*htoh has the 
saae prior distribution as the raatsiss considered in flable fc.2.2. H&e 
is a relsvtlvefy loose prior dlsts&bo&on and it is aean that oonvergenoe 
°* C?nl5T SX^ff 5 is slow, although eompariran with fable **.2«2 
indicates that this approximation has smaller error than the approsimatlen 



•91* 



1 
2 

3 

<> 

5 
6 

? 

8 



WJ 



6 



(0.93301 
(0.93311 
(0 .93313 
(©.93311 
(0.93310 
C0.93303 
(0.93307 
(0.93306 



>i05 
►313 



0.06724) 

0.06713) 
0.06?iO) 
0.06709) 
0.06709) 
0.06709) 
0.06708) 
0.06703) 



C(n,%) 

1.00025 
1.0003* 

1.00023 
1.00020 
1.00019 
1.00017 

1.00015 

1.0001** 



C(n»f) 

(0.93278 
(0.93S89 
(0.93292 
(0.93292 
(0.93292 
(0.93292 
(0.93293 
(0.93293 



X(o»^) 




P « 0.93454 
IO.86357 



,©6?22) 
.067113 
►06708] 
►06708) 
►06708) 
,06708) 
,06707) 
,06707) 

0.065^1 
0.13643] 



P) a 



©joputai&en Timas 0.62 ratataa. 



0.0038 
0.0048 



0.00381 
0.0048] 



i^Hlo tt.2 r 4 



a 



1 

2 

3 
4 

5 
6 

7 

8 

9 

10 

It 

12 

13 

14 



ft 



JEM?) 



(0*43377 
(0.44423 
(0 # 44J*47 
(0.44443 
(0.44308 
(Q.442U 
(0.44088 
(0.43992 
(0.43890 
(0.43808 
(0.43?25 
CO.43656 
(0.43538 
(0.43529 



r 



,251 
,61? 



0.352 
1.120 






y(jrt 



£3 





«*92*» 


') 


C(a»^; 


0.63815) 


1.07192 


0.62777) 


1.07200 


0.62433) 


1.06380 


0.61940) 


1.06383 


0.61632) 


1.059W) 


0.61317) 


1.05528 


0.61082) 


1.05170 


0.60858) 


1 .02)350 


0.60679) 


1*04569 


0.605H) 


1.04319 


0.60370) 


1.04095 


0.60238) 


1.03894 


0.60194) 


1.03712 


0.60018) 


1.035^ 



6.1516 

0.083? 



C(a # ^) 



zLi^m 



(0.4046? 

(0.41439 
(0.41586 
(0.41776 
(0.41824 



(0.41921 

(0.41957 
(0.41972 
(0.41994 
(0.42005 
(0.43020 
(0.43)28 
(0*42038 



» R>.41625 
jO.35521 

0.1516) 
0.0337J 



0*39533) 
0.58561) 
0.58414) 
0*58224) 
0*58176} 
0*58105) 
0.5S079) 
0*58043) 
0.58023) 
0*58006) 

0.5?995) 

0.57980) 
0.57972) 
0*5?962) 



0*583751 
0*60479] 



Qonpatattan Tias2 5*03 alaatoss (n » i# .... 12) 

15*03 aiaotes (a a 13* 14). 



ftftxte 4.2. S 



-9> 



n 




0.3723& 






C(b»^) 


CUV#) 


1 


(0.328W 


0,30058) 


1.00133 


(0.32797 


0.37185 


0.30018) 


2 


(0.32915 


0.372&5 


0.301*2) 


1.00202 


(0.32W 


0.37070 


0.30081} 


3 


(0,32963 


0.37185 


0.30100) 


1.00218 


(0*32881 


0.37093 


0.30026} 


t> 


(0.32990 


0.37160 


0.30123) 


1.00273 


(0.32900 


0.^059 


0.380*1} 


5 


(0.33W 


0.37*71 


0.301*0) 


i .00286 


(0.32912 


0.37062* 


0.300 


6 


(0.3301? 


0.37162 


O.30U7) 


1.002^6 


(0.32920 


0.37052 


0.30028) 


? 


(0.33023 


0.37l6fc 


0.30112) 


1.00299 


(0.3292ft 


O.3705* 


0.39022) 


8 


(0.33025 


0.37160 


0.30113) 


1.00298 


(0.32927 


0.370^> 


0.30023) 


9 


(0.33086 


0.37160 


0.30110) 


1.00296 


(0.32929 


0.37030 


0.300 




<w 


» fl8.265 
2.385 

Ll.005 


2.102 3.91o| 
5.168 10.111 
7.612 1.212] 








1 


1 • (0.75231 

1 0.13901 

|0.a022i 


5 0.08658 

2 0.2925? 
5 0.77W 


0.16106^ 

0.572M 

0.12331) 








I® 





0.007* 

0.0063 

0.0085 


0.0031 
0.0111 

0.0161 


0.0053 
0.0131 

o.oioo 








On 


spot&iloa TSlass 2.15 aiw 


ifcas (» 


a 1» ...» ?) 


. 









^y * fo3n6 



has saolles* ar^or than the appEastation ?' n nW» 3ne Hret tes&re 
IfceratSoss reqatred a total of 5*03 ctestesj x&Ala iteration t&Arteea 
and fbrartssn oanatjaed 15*03 states* iXlaatmt&i^ the eapaciantlal growth 
of the ocapatatlon t&aa tdth n* 

A3s3 tranaitlon ootids Ehioh has the ©an© psAo? <&stsibati©n 



03 tas used in ociapoting ?abl9 **«2.3 Is oonatdered in "Sable **#8#6. & 

1 _ 
tbia ease* (j/JT^) X(a* w) oat im 'ges f&ater «wn the approa&aatlon 

5J* (%)■> Tw>«paao® aeonpaay is eotderod on the firet iteration* tslth 

tharea^paao© aoearaoy on the third iteration. The oonpatat&en tia® for 

the first eeven entries uaa S»15 sAnaieaj the t&iaa for @nta&®s 8 and 9 

la not aTOtlabla* 



&»2.& ^ftffl^ffi ^ ttefof.* ^jp if . Let 

b© the exacted va3na of tt* ff «hen p'haa the <&at&lbfttl@n ftmotton 
F(P \+}. If F(P |t) la ooottnooaa on the baantagr of ^/^ IheoraR &«2.g 
iegUea the eadaienoo of the integral (*t,2»S2). Zf F(?lt}^ 9 a f^Ogr 

■HI 

of aiatrlbationa aaaftlnaons in ^% than laeorao 2#*k3 iopa&es that T^^CP) 
la a eont'&aaoaa function of + . When ^ la alas eXeaed trader the 
esneeeot&ve sanpaing rale* the fbHotalng theoraex shcras that in -O^) oaa 



b© appa?o?iaated fcy B Cl£?*2ft? I * !• 



ai 



Sbtt^SM &*2Jk If the psior a.sfcs&bati©n Sanction of j[ is F(P \ + ) 3#< 
a faaUy of dists&bQtions oontlaaona on the boundary of ^ *&ieh la 
eleeed urrier oanseeutlve senpSlng* then 



-95- 

n-*°* S[f^V"V3 ■ %*<+)• 0.2.53) 



£geg£« Let c > be given* 



* I— 4.8* {P^VJ^^W Tr j{ £) 



unifomly on j0~ for any *e(0»i). Arguing as in the proof of theorem 
N 

<J-*2*5t tie may choose n and v sufficiently large end a > suf&oientay 
sraall that 

proving the theorem* Q*E*D. 

Thaoyam fr.g«9 If the prior distribution function of T is P(| 1 1 )« <0 » 
a faa&3y of distributions continuous on tha boundary of ^ K s&ieh is 
o3osad under consecutive sampling* then tha product measent ^*A^) 
satisfies tha following functional equations 3 

iy^B&y • ••» $J 

N » 

iwl Js»i *J 

Jtec fe* The oonc&tion Cft«S«55b) is necessary to insure a unique 
solution to the functional equation (fc*2*55a)« Sufficient eea&tlois£fe? a 
unique solution have not yet been found* 

£B9&§* Since 

""VP V^ - ,£ j 1 ir k < I >ir a ( I> «feV •*'■* ( *- 8 -- 



-96- 

eqasticn C^.2.55a) t&Hem from Lxaaa 2*3.2. Equation (&• 2.55b) Sbll©6?s 
fcy soasirjg (^•2«52) ©var i and j« Q.B.D. 

Given ^ B « 1 Ct')» 7^<^)» and ^(f), the oovarianee between In 
and Tr ig computed f spos the relation 

©©vCtt^. flFlrt" ^C/*)- ^ 4 (^)7^(^). (4.2.57) 

Con -rider a Mastov chain ^deh is operated indefinitely under a fixed 
policy mth initial state i. If T» ?» let ^(^ be the conditional 
expectation of the total discounted reward earned over an infinite period 
when the state starts in state i and let V(£) ■ (V«(|})» • ••» V (P)) be 

the veetrr of sspeoted discounted regards* Howard* has shown that» for 
any Pc^, ?f including periodic and multiple>»ohain transition matrieeso 

\(p) * f fP 2 2: p 4 ( ^ p- ?-. <fe.3.i) 

i«4» ...» 8 

o^p<i 

When £ is a random aatfix V(£) is a random vector. In this seotion 
the mean and the variance-coverianoe mats4x of V(P) are studied and 
expressions for elements of these latoenditional expectations are ftmnd. 
il set of functional equations Ibr the expected value of V (P) is derived 
«hi©h is closely related to equation (3»i.5)» afcleh tsas discussed in 
connection vith the discounted adaptive control problem* This relation is 






C 2], p.82. 



-97- 

used to obtain a raothod of saoooss&v© approximations lb? tho Efli3@3pS.ca2. 
eeleolatlos) of tho espeoted valoo of ?.(?)• Soa© ©saas&as are proBcatod 
in Section fc.3«3» 

^•3*~ E»?r*«foft J&&2& ££ V (P)« **t p'taro tho pela? distillation 
fuaotion F(P(t) and lot 

V 4 (t) ■ f V 4 (P)dF(P|t) i«4, ..., H C^.3.2) 

bo tho ©spaoted t®1b© of ^(^)« Th© first theoyoai afcowg that this 
expectation ej&sto and pfsnAdos a £bm&a £b? V* C^) in tenss of tho 

c&staibatlcn belongs to a faiz&ly closed vnade? eoneeeut&se sampling* A 

ppollalaEasr Xsaaa is j»©qa&y©d« 



fr.^4 If v 4 (P) is defined tgr tho inflnito sos&os (&«3.&) with 

Q£p<-& 9 thsn tho sosies oasroorges tsaifteiSy in £ and V(g) is ocntftsnona 

on jf„ ^ieS« ...» N)» 
n 

£saafr For any finlto n, tho factions 

^ ^* fed tni id Jc * 



fed && 



i&ip • • • ^ fa 



are oontAiacsQs functions of P <mjf „• Moreover* if 

° N 

tho ftmcUens % (n»P) aa?o bounded on ^ „» 

is ft 



Same 



n*0 
iter 04$*: 1, £ ^(n^P) converges tsntfosoSy to VAp ©a ^ ' and 
V 4 (P) Is a oontlaaGfUs ftiaet&on of P on yi» Q.E.D# 

tkjtMs, tet p'hswe the ps&os* d&sts&bution ftanot&Qn F(P It ). 



Then the e^sotatlosa V.(^) defined ty equation (^«3«2) es&ste* If 
PC? |t )e 0^% a feolfy of &stg4butlons abssd under the ceaseeGtive 
ssaapling ruldt then 

1 n*G £4 teA 15 # * # 

lot, .## e H 

^fcera %1 ^^ ^^ * 8 ( ^^ ne ^ ^BT equation (*«i«t) 

toa^* <H *a integral C^.3.2} QKists fc^ vbr&m of the Qonttauatg? of 

the bounded Amotion V. (?) en J* . Sino© the infinite series (*>*3*1) 

eonwapges tinifbsQjy, 

VAT)" £ pP S £ * fpfi* p dF(p|t). (^.3.8) 

1 W« Jei fcsi $* ^ *3 # » 

If 3* is elossd undes> consecutive sac^l&ng* Leesae 2«3.2 sieHs (^#3«?>» 

Zn Sootion k.^.2 «® shell discuss sppstEtartaons to V. (*P) tshioh are 
bossd on equation (^*3*?)« The results of the 2©lXenstng paragraph pgotflde 
a c&ffleapenfc basis for eorapata^on of V. ( f )• 



,3.2 A PVngttQBfia SSBS&m Mi 2CQ* Wenowrelata \(+ ) to 



Min C353* p. i3«*. 



-99- 

v C f)t the aaataa ©snooted dieooonted ffesasd diseaseed ia eonneeti&n 
x-&$h tbe adaptive oontxol pno&laa. 

thfiftwant fr. 4 ^ if p'hae the pxior dlstarlbutian Amotion F(P \ ^}e ^ 
a fssiV of dlstj-ibottoas oSosed wader csnseoative sampling, then V( V) ■ 
WAt)* •••» T/ J*)) sat&o&eg the SbHsuing set of stoXtanearaa 

ftmetiQn&X eqoatlonsy 

\<t) * ^(f ) ♦ p s s^cf) Y^c V»* (*.3.9) 

i«l» ••»» K 

tsnere %C40» the oapeotod one-step transition *©«saFd tshea in state &» is 
defined ty eqoatoon (3.1.^). 
£Bgs£. Letting 

N 



equation Ck.3.t) oan be tsrifcten 

X a teaS . ^ ** hhG tel teal *J J* « 



feai ** ** n«0 J»l teal 

%(£) ♦ P £ P ^ ? ^ 2 iffP*** 

^ a*i ^n^O j«i te<4 "« ** & 



4 ^ppO 



Thus* changing the iades of suaaation and using Lassa 2«3*2, 

■A* 
R 

■i(t)*ps SL/n vet _<*>>• C^.3* 

Q.S.D. 



-too- 

Eqpat&en (k.3.9) has the ©as® fbs© as equation C3*i«5) and say be 
interpreted ae a discounted adaptive control equation in tMe& t&epo is 
soneUy one alternative in each state. She results of Sections 3.2 anfi 3*3 
apply and are stsaraariaed in the fblloaing theorem which is stated «&&»<& 
proof » ainee the proofs in tUe sore general ease of isoner altemativea in 
eaeh state are given in Cbapte? 3* 

T^epevf? &sjU& Y&ere eaists a untojaa bounded vests? fbngt&on» 
W) * OAt) 9 ...» V («f )), «feioh satisfies equation (fc.3.9). Let 
the Bcqpmo&a of functions < \(n§t)\ » i°&» ...» R» be defined ty the 
equations 



i»i 9 •••» n 

Osp<i 
V°«f> a v.(r). i*i» ••••» <4,3.i3b) 

Then* provided the tesrainal funetloas V (40 ape bounded^ 

1 

tto® saqoG^oe i v *(»»^)< converges u^ftmaSy to V.(*f)» the tmique 
bounded eolation of C^.3.9) • If the testfinal ftotioas are constants* 

than the sstop of tbe nth appgosiaantt 

\<n,^) * \(f) « \(n,t). <<*.3.i6) 

has the l-socd 



-lOaf- 



e^f )| ^ 3 n C affix {^. v% V* • ^3* <^.3.1?) 



O*0<i 



££» ffcrtheswore* 



V* . ^ ^ *, (*>.3«i8> 

than |V-Cop , f') r 43 & raaasten© iooreasing sagaeoee «&th Halt V|D m% 
the bound C^.3*S?) bogoraes 

* e^n.*) £ P n ( ^ - v 9 ). (fc.3.19) 

if » « 

thai i \-- n» ty$ is a carotene decreasine sequence with lls&t V^< ^) and 

P° £ 1& • v *3 ^ «!<«• *> £ 0. i«i> ...» B (fc«3. 

0*£<i 

*«3«: ! Ift ffKKftapl %»flrf|- % Siaastrnto the above jporaasfea* oca© 
saE^ale ecr .loiations b©3©3 en aqoafcloa <^*3«13) as?© displayed is TsMes 
^•3«i«^t>«3^ • These tables ooatain values of 



f(n»f > 



\<»»%) 



urates* ires* rae poHeles in a ta>»state Hastaw chain tdth ts» alteratives 
in saoh state *&en 'p'has a aatsia beta &sta&bat&on» Ttie discount factor 
is (J e o.?, 1h© Tesasd a&tris aod the prior distribution aj?e the saa 



See Apr«x!lijs B for tfce progress listings 



•103* 



Vfo^) 




1 

e 

& 

5 
6 

? 
8 

9 

&H$ s (l f l). 
G9s$ot&tion tides 0«| 



9.660 
5-596 

9.663 

5.596 

9*936 

5*5*J9 

9.982 

9*991 
5.5<* 

9.993 
5.5& 

9.993 
5.5*& 

9.99^ 
5.5^ 

9.99^ 

9.99«> 
5« 



§<a»fir) 



0.33& 
*»0.052 

0.331 

-0.052 

O.058 
-0.005 

0.012 
0.000 

0.003 

0.000 

0.001 

0.000 

0.001 
0.000 

0.000 
0.000 

0.000 
0.000 

0.000 
0.000 



V(O^) ■ 



Mn) 
U.90** 
2.381 
0.4?6 
0.095 
0.019 
0.004 
0.001 
0.000 
0.000 
0.000 




n&te&oa 



-103- 





1 

2 
3 

5 
6 

? 

8 
9 



Vfof) 



10.253 

13*2?8 

10.2f& 
13.2?6 

10.&7O 

i3.f 



10*53? 

13*652 

10*516 
13*665 

10.517 

13.66? 

10.518 
13.668 

10.518 

13.668 

10.518 

13.668 

10.510 
13.663 



Fb31<^ s (1,2). 

tap tat&cn tfcaos 0*39 nlnafcesi 

P e 0«2* 



0.265 
0.390 

0.86k 
0.392 

0.0**8 
0.082 

0.011 
0.016 

0.002 
0.003 

0.001 
0.001 

0.000 
0*000 

0.000 
0.000 

0.000 
0.000 

0.000 
0.000 



A(n) 

1.9*9 

0*390 
0.078 
0*016 
0.003 
0.001 
0*000 
0*000 
0.000 



V<0»&) * 110.253] 
- S |l3.278| 



«flaKbji fr.q.S 



40^ 




1 

2 

3 

5 
6 

7 
8 
9 



5<©,^> 



5.616 
5*473 

5*615 
5*^73 

5*585 

5*5?6 
5*&17 

5*5?** 
5**>!& 

5*573 
5*^1^ 

5*5?3 

5*5?3 
5*W 

5*573 



i mii'i. 



GbeTsj: station tiaes 0*55 stasias. 
P o 0.2. 



i(npf) 



-0.0&3 
•0.059 

-0.<&3 
-0.059 

-0.012 
-0.015 

-0.003 

-0.003 

-0.001 

0.000 

0.000 
0.000 

0.000 
0.000 

0.000 
0.000 

0.000 
0.000 

0.000 



vcotfi) 



A(n) 
12.02? 
2,405 
0«4Bt 
0.096 
0.019 
0«00fe 
0.001 
0.000 
0.000 
0.000 




•fttfUf* fr'M 



-105- 



Sfcitjr) 



6.C&2 
12*708 

6.0&2 
12.?0? 



n 

1 
2 
3 

5 

6 

7 
8 
9 



SbHc^s (2,2). 

Ctapeiation Un»s 0.62 sAmtes. 
$ a 0.2. 



|Cno^) 



unit jUDinn 



5.988 
13.0^3 


-0.012 

0.05k 


5*999 

13.08& 


-0.001 
0.013 


6.000 
13.095 


0.000 

0.002 


6*000 
13*097 


0.000 
0.000 


6.000 
13.09? 


0.000 
0.000 


6.000 
13.09? 


0.000 

coco 


6.000 

13.09? 


0.000 
0.000 


6.000 
13.09? 


0.000 
0.000 



0.C&2 
0.389 

0.0212 
0*390 



V(0,« 



J3 



A(n) 

13.958 

2*792 

0.558 
0.112 

0.022 
O.OOfc 
0.001 
0.000 

0.000 
0.000 




J&£i&&tJa& 



-106- 

thsse seed in oaspating Tables 3»5«i • 3*5«3 *br the adaptive eeatsol 
psot&ea (see equations (3«5**) • (3«S»W» For aaea of the fbor possible 
poHoS.98 2 ££* the terrains! ftinotAane are given ^gr 



the e^peoied dlseownted regard starting froo state & t&en P(jg£) a g(£D« 
Xt ie seen that ooEreengem© tsk®s p3&ee* in eaeh esse* ty the seventh 
iteration xtom eeearasy to the thlt?d deetal nlios is desired. She 
etnpata&len time indieated at the end of eeeb table is the total t&ae 
required 'to ealotilate ell nine iterations on an IBM ?09k ©aapater* 
Also displayed in each table is the error vector* 



•(Btftf) 



• ?(*.&), 



V t (9b%) 

taking VC^/ ) » V(9»^K The last oolosm of eaeh table ©ontaine values 

ACn) » tf 1 £oas {^ - *\ f* - |jy"\ ]» 

the absolute error betsnd of eqoat&en (^•3«i?)« 

GeH^'Ans these octo&at&ono *&th these of Tables 3*5«& * 3*5*39 $& is 

s©@o thats in this esample* the adaptive oentrol problem and the proxies 

of ahoosl^g a tera&nal poHqs? «EAoh raesdnises \(f?K) both have the ossae 

©ptSjaal 1 it&ol poliqv and the sasae total espeoted regard, 

fc.3.fc Ya?^(M^Qr^toy^ MeU« We eene&ade this seotion «&th a 
fbraola f >r the oovarianoe of ^(f) end V (g). fids equation iasoloes 
terns of the form S CSS lu[ p"]* **&<& eea be eeapttted «ith the aid of 
fheorem fe,i*3» Approstotione to am i\(jj)* tf«(g) W* 3 &&* considered 
in SeotS.c i tt»fe«2* 



-107- 
WUHBKWI fhltS Xf p'haa the distslbafion focrwtlo^ BtP|^)€ ^, a 

S IS 

faad3y of distributions closed under consecutive ssapHngt then the 
oovarience befceean V. (|) end V*(^ is gives by 

rev Cy?), V<Jf)| + 3* £ 5^2 P^<+>*U*Wv 

le ^ B wO vcO OtY,k,ffl«i <* ^^ 

&»&£« The expected value of the product V^(p) V.(|) Is 
E £V 1 (?) V (P)| tl ■ I ? ? » 4 <»t£> v («,J)dP(P\t) f (4.3*23) 

where 

V*# w J (v, #l <(H * )2 ^ # *•**• •••• " <***W 



Since 

(rV " £ p** * (R ) „ c -, 0«P<1 (4.3.26) 



the doub3* sua in the integrand efr (4.3*23) converges un&foroQy to 



^(P) V^(P) onj/ R * Dbm, 



B CV 4 (P) tf (P) | n • ? 2 0*** 2 r,r fpf n) p , P^VLjftg I + 

-4, (4.3.27) 



Rodin C353» p. 13^o 



•108* 

Applying lataaa £.3.2 talea* 
E CV 4 (P)V A CP)|t] » 



*° °* ,n*v 



2 U^)UW^)v«i ^i?^ 1 *<*<*<*>>*• 

n*0 m«0 a,Y t k,r*i * »V <"* QfeEy ia Ja I ay afc 

(4*3*28) 

subtracting 

1 bfO \*a0 a,Y»fc,0>l «se ay afe a/ *# <& Jn » 

(^.3.29) 
ffroa (tt.*:.28), ©qoatien C^.3-22) id obtained. Q.E.D. 

s 

in lf\ than \(t) and wvCV (|) t V.(|)\+ 3 are eontiaaons amotions of 

Pmo£« Laraaa fe.3.i isjpaies that the feoondsd fteoUon VL (?) is 

* a 

iateg*abie on ^ . Thus* U^ Ifceorera 2JM» V.Ct ) and £ [V. (P)V 4 (P) 1 t ] 
as*© continuous on T • Q.E.D. 

Oenal&sp a Hasirov chain operating indefinitely undo? a fiasd policy. 

?h® conditional expected reuavd per tossoition. given that P » P. an 

S3 8 

es?§9c&o transition E&tsiss* is 



and is known as the gg$& of the pspceess. When |f is a randoa aats&sg t&tn 
the distribution function F(?(4^}» ^iioh is aeauswd to be continuous on the 
baondaay of -€* % then g(P) ie a random variable. Ine mean and vsa&aaoe 
of g(P) as?e investigated in this section* aasuEiag that F(P\+) belongs 



-109- 
t© a fasl'Jy of distributions dosed under the consecutive sampling ruls* 

U.fc.i jggaji 4D& Mart-ance oj£ *(!$• **t the expected value of g(|$ b@ 

-A 

Equation (^•^•i) shs*J3 that g(F) is continuous and bounded* hence* intograble 
on jsT • Xt F(P|^ ) la continuous on the boundary of -*o «.t then Leraa 
<»«2»1 lsapliea the existence of the integral (t*»*t*2)» 

Iheoraa fr-M If ? has the distribution function F{P |t)c ^ # a 

e b 

faaUy of distributions continuous on the boundary of j^ which is closed 
under consecutive sszapling, then the expected gain, gCf)* is given by 

8 U 

vhere ^C^) is defined by equation (k.2#33). 
Proof - ^r equation (U.4.1), 

i( t ) ■ £ £ r« 4 fir. (P)p a dF(P| f ) . < I*.*.*) 

Application of Lerasa 2.3«2 yields <^.*fr.3). Q.5.D. 



&*&*£ If p'has the distribution function F(P|^)e9 l » a 



faaily of distributions continuous on the boundary of JS tjhieh is closed 
under consecutive sampling, then the variance of g(P) is 

EC 

var Cg(^)lf1 ° 
«here TT. .(t^ is defined \^ equation (fc.2.52). 



-110- 

£%8S&» The cssan square of g{?) is» using Lesssa ?.3«2 and equation <fc.?, 

m 






l,«,k,a»l ij tea ij km ij tk to ij 

The ss&stsnee of ££g (F)| f] follows feaa Leeaa 4«2«1 aid the continuity 
of the bounded function "n*(P)in (P)p, M p enjf*» Fros (<t.fe.3)» tb© 

■i is K es ij feg* « 

square mean of g(P) Is 

S3 

it j.kf^i iJ ^ 13 tes 1 lj k tea 

(fc.fc.?) 
Subtracting (*»•<»•?) frea (*J.*.6)» equation (&•**. 5) Is obtained. Q.£«D» 

IBWRf^ffl fftitibfl If jfhas the distribution function F(Pl^f)ec^» a 
faaily of distributions continuous on the baundaxy of Jo„ which is 

H 

continuous in T • them the expectations I(^) and ^mrCgCP)!^ ] a?© 

■I 

continuous functions of f' on f • 

&g&£« The theorem follows ieeaediately froa Theorca 2*J»*3« G.E.Jh 



The preceding results can be used to approairaste the m&m and the 

.A/ 

eomariaztcs aatrix of the discounted regard vector V(P) discussed in 



Section 'v. 3. We assess throughout that the prior distribution ftaietloft 



-111- 

of Pis F(P )t)e o^ f a facAly of distslbut&ens continQons on the fcowatay 
of Jd which is dosed under consecutive ssrapHag. 

The ©spected value of V.C?) is V 4 (+)» which is given ty equation 

(^•3»7) in terms of the mean n»atep transition probabilities, K? ( ^)« 

«. lira •* 

Since 



pf^( t) » TT(t)» we ean replace p^^Ol^t > in £lfr.3.7) 
ij J *j j 

for all n larger than soae integer n to obtain the epproslsation 



V t ( + ) » 



S P 81 £ £ K4 <T. (t))p - tt>r* l * Ta * £ T-CT. <*»P- (f)r_ , 



s*=€ joi 



or» using (*fr.ifr.3) 



#** 



n _ If N 



-<n) 



n%i 



(4.4.8) 






l&i, *.« 9 N 
n^O* &,£*••• 

0^P<1 
The aw©-? incurred when the approximation (4.4.9) is used is 

n ^o^if^p*** 
OiP<i 



Since 



-(n) <T (r)) „ y^ ( y^ » 



6 i, 
i»3°i» •••• $ 



(4.4.&1) 



& « Cn 9 f ) can be bounded fcjr 



t3h©r© 



•118. 

lgl» ••«• H 

^« ¥ 

R 

is th® soon one»stsp trsnsltion reward. 

Hs@ beond (£t«*t»12} is oo»s@7^atlve9 tlgfeteo? barcmda psqaiF© a tight®s» 

feaand ©a p^( t) - tf(f ) I than that pjrf/sidsd fey (<Mt»li)« this is 

1 13 5 » 

a problem £b? $2te© investigation* 

The comrisnce bstwson V.(pJ and ^.(p) is giwsi ^ equation (4>*3<i22) ( < 

Efir Tfasoraa &.2.8 t 

n-*- scrf a ^ ) |+ 1 • T^ ( + >• l 9 J,o*et, ...» S <4.*.l 
v^co la $a en ^ e § 

Fop all n > n and v > v » 1st tss tiso th© arapyosiiaations 

(4.4.15) 

in ©quatflen (4.3.22) • Using (4.4. 5) f w» t&sa hsre 

*>» IX <p>, v (P) | f ] 4 s 2 f™ c ^ . ( t >*_ r 

an *v +2 
♦ ^7— — wrCg(Slt3. U& » (4.4.16) 

•She anw involved la using (4.4.16) to appsaaiKat© eov (X(P)« V«(?) | + 






- fretful VV* M1 - WV t))] yv+» 

(W».i7) 



• e 



A eonservative bound on ©. .(a »u $ •p) is 



Vv**2 H 



k>Vit)| ~— sr s 5i*<t>kJ . fej RLW^M ^ (f) l 

1 M ' (i«0) z o^tM^* ** » ©y! L sy ok 



*M*»3 i& AMAffin £g3&£SB* w© ©enolude this ©hapfcer with a theorea «hS.eh 
«» 
palates g( t) to V ,(+)• 3os© results ffcxs the theory of susmtion of 

divergent series are required end are saanariB<3d here without proof. 

k®^ t ®n\ ° 1 %• e «* ^2* ••• I ** ft sequence of pool nuabers. Let 

i- n 
n voO v 

2f n ^t* t exists sod is eqaal to t # then the sogueao® \ a $ &s said 
to be £fifii£2SgBH38&L&* ®* &r£3BBBB$8* *° *• ^ ^® C|-sub of ^ © ^ 
©sdsts arrl is eqool to t» then a ifcsoreB due to Abel states 
J 8 * 1 , d-P) £ a^B* ©skate awl is ague! to t. 



T^aftPap ^j^aSt Let P has© the distribution function F(? \Y )<sc?*» a 

O E2 

fazsily of distributions ooatteaous on the boundary of -v « tMeh is ©losecl 
uader the oonsooativ© aempaiag rule* Let V Of ) c v (p^'f*) sad KV") be 



* 



See, for eaesDjpl©B Enopp [2?3* 



-UA- 



u Has© 



defined \& aquations (^.3.2) end (^.2} reapootl^y 

p-*l. 1 4-€$" 

,&£&£• We first show that the Cj»saa of the seqaeno© } p?J (+ )\ 
©ffdsta and la equal to TT(*f) £02? Jet, ♦..» H and any ^e¥* Lot 






Lot «>0 bo ghron. Chaose 

Than, for n > n , 



O 

n soon ^feat, lor fixed ladiees 1 and j and 



p<">(+-) - if (t)| < 



•ft ^ * 

a • n > n 



(&.4.SS) 





^^ I, l^>- , "i«t)l*f 



* MM 

ehoos&ng an Integer M > n such that, If n >/* , 



wq hav© 3 :<br n » 



< * » 






and, thepstfbre, j£!iL> **«(»»*> • IT < tfOt proving the 
Qfeta& equation (fe.3.7), 

P"**- * £& feed dk # 0^*~ go© *3 



(4.^.aS) 



-115- 

Haas, using aqaat&oa ^.^*3)> 

£S u (M4 \ce.-n - £ £ ^<y ^V** 



CHAPTER 5 
tEEMXBAL OOTODL PROBLEMS 

In this chapter we consider sequential saspHng sodele of a KasCsov 
chain with alternatives is which there Id on esgHdt emnling coot. 
This leads to a distinction between s&spllng the process and using the 
process. When the process is flflpfffltll the sequence of states oeaopied fegr 
the Markov chain daring the sonnling periled is nade known to the dsdtstonp 
seker* who then uses ttds information to tapdate his prior distribat&em 
on £ • Dosing the saapHng period the process earns transition rewarda 
as spseillsd tjgr S sad sampling costs are incurred* 

On the other hand* if the process is a&ggl over a period of n 
transitions* it earns transition rewards* hot the deei8len»s£&er is 
permitted to teiesw only the initial state and the final state of the sassg&a 
eeqaenoe* She only saaple east incurred is that for observing the state 
cf the system after the nth transition. 

It is reasonable to as$&et that* after a finite araamt of sas£23&*g 9 
the prtor distribute of £ will be safStel^tly tlg^t that the best 
©oars® of action for the deeistewiiaker will be to cease sampling and 
operate the process under sob© Used terrains! policy indefinitely* Hence* 
the oodela of this chapter are called tfffiffliflffil ?*"%$- WB&&3.* la the 
£bl3owiEg sections we show that such a terminal decision point oocars with 
probability one in an opt&aal ssspSing strategy. 

^erininal control raodels are applicable* in general* to any H&rk&v chain 
with alternatives la which rewards are earned independently of the deeisSor*-' 



-U7~ 

mk(3s>*Q krvjc^Lodgo of t&s sogosooo of states ©ooap&ed fc^ the syetea and in 
x&ich It is poositSLo Iter Mia to dotoaraino the stata of tha system at any 
t&raa* for a non-aes'© cost* A speciflG €KSKpl© of sfcok a prooaas Is a 
Rostov ohain scx3©X of consoles* t&sndpSR&t&dng beta§.$2»» «nes»o a sos*?®^ 
oast feo mada to dstasf&ne the oec&pent state of th© raasfest* 

lteo=»aot&on soqoantial aaagfl&ng peol&eras tslth indspeadsnt identically 
dLstsifeated oboQS-sations ha*© bom esarainod fsoia tao Bayosien paint of 
t&«w ty l"fetbesi31 C^0]» A ais&2ap pyobleea with Kasfeo^dtepaidaa* observations 
tsas raoantfy considered *gr Ebat C9l« 

Xn Soot&on 5»i wo &&£&&o !<&d©l £» a discounted tese&ml control sK.dol 
in t&ieh the decision-sato oast sasapls at every transition of th© process 
uatil a tess&nal decislen point is reached. This raodoX is fomolatsd so 
a eat of functional orations and it is shosai 1fcat a tert&nal decision 
point is reached «&th probatftSitar on© in an optical aaapaing strategy, 
Xt is staa# in Section 5.2» that there eslsts a taniqa© solution to these 
sqeat&ono and a neibsd of saoeesaiv© appros&nations is introdaoed. $tds 
soaal is generalised in Section 5»3» sta?© Sfedel n is intradooed* f-fedel 
II is a discounted tess&nal ossitel «33daL in tsMoh tho doeieieffwaakep can 
either ssaple op qso the p&ocees i3Dt&& a tesrainal decision is raado* 
AspRSteito methods of ashing tosslnal decisions are discussed in Section 
5»&* Sftdole of uodiaeounted orooesses as® introduced in Section 5*5 and 
the chapter oonclodos t&tk a bs&ef c&ecussion of aefc»up costs* 



5#i 

Gsnsrlder a Fasten chain tdth alternatives tMeh has tbe reward sats&s 
=- ° £ p i^* A * ^^ ^eosition the deeiatoswoaker oan ©itker ca^fi!© the 
process op choose a terainal policy under obid^ the ssmten is to bo 



-118- 

opsr&ted indefinitely Lot o. V Obe the oost of observing the system and 
fiadiag It in state & (i«i 9 • ••» 8) • The cost of any saspaiwg strata 
is» therefore* a random variable before it is eseeated* Assuming that ■$!© 
Interval between transitions is constant, we may use this interval as the 
trait of tine* Let be the present veins of a unit reward received one 
unit of time in the fntore (O£0<1). We shall seek a sampling strategy 
which masdmiaes the expected total discounted reward over an infinite 



When the dsolslon°»asksr chooses to sample we clearly have a ease of 

oonseoat&ve asspling. tfcos, it is assumed tfcat the prior distribution 

function of f is H<f K)e^ , a family of distributions elosed 

under consecutive sanding* tst (± 9 f) denote the generalised state of the 

system (1*4, •••t Nj * c *£ ) and let v.( + > be the supremum of the 

1 

espeeted total discounted reward over an infinite period if the systes 

starts from the generalised state (iff)* 2n Sheerem $AA it «ill be 

shown that an optical sampling strategy exists and 9 therefore* that v { *p ) 

i 

is the maximum expected discounted reward over an infinite period. 

If» Hhen in state (1» +)* it is decided to sample the kth alternative 
and the apetm then makes a transition to state j» the supremum of the 
posterior espeeted reward is 

' • 1= 

Hsdo! X is easily generalised to allow the sampling cost to be c * 
the cost of observing a transition from state 1 to state J tinder -■' 
the kth alternative in state i. In this oase equation (5*i«3) below is 

«? <r> « s a <f > c* . 

i jb|, *J *3 
Hsdel £Z t however* reqtsires that the sampling cost be independent of the 
state from which the transition originated* 



She g&sbQtiP&tgr of the sogpELs otstoco© J 9 uaeossd&tissaBl f&ta reepeet to th© 
paAor diatribat&on t &&& that the a^-st-aa is in state (i 9 4") and altosmti*® 
fe la in es© 9 Is the polar eqpeoted 'sale© of jr 9 

end let 

be the expected eest of sampling alternative k t&gn in the stats (l^). 
Thent If it is decided to sasfig&s the psoooss on the nest transition* the 
sapretaisa of the pa&or ospeotad regard Is 

£&. ft <r> - **f > * > Ji ^ V4 ( M • 

1 i*i 9 ...» b (5.1.4) 

3appDae 9 on the other hong* it is decided to ©ease saapling end to 

opea&te the process indsCtaltoly nader the policy «£. &©t 



\(£*t>* fVP^^CPlt) M# «M t S (5.1.5) 

be the nnesndlt&onaX e^ested discounted reverd otos> en infinite i^piod 
ishen the p&&€® SSL *& ^s@d and the system starts from (i 9 *f). 1n«3 9 if 
it is decided to oesse saapling 9 the zw&mm prior selected rss»rd is 

™ fo(£e*>l • **. «•• ». (5*1.6) 

the ottdesBa exists in (5*1*6) einee £ la * finite set. 

Using equations (5»1*4) end (5.i«6) 9 we have the &Ue%dng set of 

»' 

Cf. Seotion 0.3. 



•»12&» 



SoxMobbH asjoations ^hida ssast bs satisfied fcgr the v©ot&s» taction 



\(f> 



3»t 



£L«£ 



^\t£.*)\ 



0^1 



(5.i.?> 



It is to be rated tbat the saa© s^a&als \ (+) # is seed in ©gaatta 
C3»2*5) &ssd e^at&osa (5*i«?) to ffoprosent t«© c&stinet gtanotions* this 
has besa dose to a£ap31£y to eosa© extaot a neoassai&V esapaioated notation* 
a. analog of tto ssafcol ^(f ) «U stag*, to <**«> ftw ite o«U«. 
Wo new ebaK? that on optimal sesspaiag sta'atagy €adsta» neS&sg use of 
the definitions and notation of Seotioa 3«t* 

JBttB£8ft£ftl*i J^t ^(f^d) bo tfca eapaofcad total dlssosnted yesas^ 
la Model X tato tfeo altera starts in the genesalised state (i» t ) and tlio 
strategy defls* is used* Let 



Siao tbss>e is a ssspSiag etratdg^ d cD. a&«& tta&t 



▼ t f+) »^<+fdf>. 






■BBtatg * Ceaside? tfee adaptive oontcol psotOos of Soot&o© 3.S* Lot 
£^ bo the sot of all posait&e ©sapling strategies* 1* in the adaptive 
esestfol prebSaa *Swn the systsn starts fcm state i. Zn Model 2» if 
Sea asp* 9 pp« *}&• &3* 



dsp 9 it is ©Sear that d is a possible strategy in the adaptive control 
psoftSLms heno©, D.<= 5. (%&% 9 ...» 8). Suppose Sell* 1&©n 3 ^Lthsr 
pgeeasCbes a fiassd po&ap* J^» &>* ^s® oa «*?e*y transition » > n , for 
ease integer is * or else net* In the Start ease* d is eiear3^ a possible 
strategy lb? I%y3©l 2. m the soae&d ease* c§ is also a possible strategy 
for ftodel I, &&•* a strategy *» sfcioh a tescAaal decision paint is 
never reached tinder aorae or all possible sample histories, tos* D. <n a 
and* therefore* %«» $ « ifo® ps*©of of Ineoraa 3«1«1 is valid f&r en 
arbitrary regard structa 1 ©* pzw&ded that the reward per transition is 
boonded* tfeas* the remainder of the proof of Theoren 5»*»* tott&m the 
proof of Ihooro© 3«l«i and ts&XI not be abdicated here* Q*E*B. 

We nocr afejsr that* ralth probability <3a®» a teeoiml dessision point is 
fsaohed in ftedel £ if an ©pt&iaaX oaapaing strategy is used* Lot §. 
denote the twe state of nataret § is assasad to be positive* as 
defined bp equation (2*3*^6)* 

JtaBS <£al*£ la Hedel I t if the trae state of nato% § » is a 
positive aatris and if the tsga&aal ftootioae \(2L*t' ) are oootisxuas 
in t ($M 9 «»•» M$ .£.«£)* t&sa* <aith pro&afcality one* a tors&nal deo&gtan 
£€&nt is reached in an opftSjaal seeapaing strategy. 

£ebq£« ^a© proof is Ip contradiction* &smaa© there is an ©pttsml 
©asap^ing strategy in tffi&di a tsasalml decision is never aade* 3ta the 
process is sampled infinitely often tinder the eonseeative ©sapling rale 
and at least one state* i* is entered infinitely often* Stnee at least 
em alternative* k 9 csasi be ssed infinitely often tib/m in state i? Lesasa 
2*3*? and the poaltivity of 9- ^^ ? ^hat every state is entered infinitely 
often* and* thereto©* that at least one alternative in eaoh state is 



.sr&tely often* Since the £&"Ste® Is operatS.ng: 
^ternativa «hiah is sampled a t^ad'te lassber of tta 
dowlsatisd by other alternatives after a finite nrariber of transitions e 
eon be olirainated ftoa ftarther consideration* Ite?* ss?e Ray assa 

•3 of ge^o^aHty* that aH alternatives are aaapled infU&toly o 
% Theorem 2.3*8, the saas of the posterior distribution of % 
«ith probability one* to eeneentrate fit § && n t the nozabe? of 
goes to infinity* That is* for any e > 0* if P ie defined tjy equ 
(2,3.3i)» 

JSU P n C If- S|<«1-1. (5.1. 

the Hsit boXrEng «ith probability one. let H(j£\f*) b© the dieter 

M3tien tahlch places the unit jaass of probability on <|. • then* with 
probability osra* ^"> ^ as ps-^ 00 and equation {%\mh) beeoaes 

l**l* »*«& n 
:,t2.QTi (5*a*ii) is a sequential decision problem In t-Mofo 
dee4slca»aaker is certain about the transition psofeab&llties and 1 
studied £y several authors* Klaekwell [ii] fcao cherusn thf.it c: 
strategy ealsts for (5»1>11) in tsfaieh a fissad policy* 2L«£» ks aae 
sgr transition* ffeNurd C^£] **ss shorcti that the espost^d ri 
9 strategy is* in the notation of this proof* 

A W«0 j«i k»i ^ * & * 

«» qf 1 ?^^) ^c the (i»J)th cCtaaant of [QCjDf* « Sat* as «-**>» 



is valid ©tfon if the distribution tghioh r 

.'?ility on Q. is not a lassber of & • 



v 4 C£**)-» \tS»f*) «ith probability one. I*®* \t£>%+*) o 
?^^\C£^ **)} • Stne© ©*> O (*■!• •••• ») equation (M»i> iffl&Ues 
the eontr&dietion 



oo 



» 4 <+*>«' ^ « r ijfctul*? 



- VjCS^t") 



.* Op*. 



6 \<H\ t*). i«l t .... r (5.i.i3) 

1tjerefor€> 9 v&th probabilitsr ene 9 & tera&nal decision point is reached 
after a Unit® number of transitions* Q.E.D. 

Lot the seqpiene© of vector $moti©ns» xCn* 4 }') » where 

£(n»f ) a (v (n 9 f}» •••» v (n»t))» be defined fcgr the equations 

* 

sax C.fe ^U " «& tr "? 

t^^tf) - Apt) * * 2: ^ (t)v (ih^C^)} 

i»i, ... 9 E3 (5.2« 

+*© ^ 

o*p*a 



v«(n ♦ 1»+ ) » sax 



»-*»«;»••• 

+© ^ 



Using equation (5»2.i)* it is shcrcm in this section that there ea&siei 
a unique bounded solution to equation (5*t*7)« Sanation (5*2* 1) can then 
be used as a ©oapitational tool to approalaate this unique solution @nd 9 
t&th this application in taa.nd 9 a bound on the erre? ®4(n#+) ■ 
▼ € (^) • v (n 5 t)» is derlwsd. With the aid of this bound, %» show that 






*" xf **• ~* {lift «**« Thl 9 ^^ 



Amotions v (n 9 f) defined ty (5*2.1) have the bound 



|v 4 Kf)| * iL ^ 8fi . *«*» —• ■ (5.2.2) 

EP0f,lf Sg • » • 



Os. p< 1 



£KQgg« ^r eqoatlona (&»3*5) and (fe»3«6) 9 

|v 4 C£Lt+>l £*r*y t ***• •••» K <5.2.3) 



re3f 
J2L«£ 



and* slnee C > 9 

k(0,t)| < ?* ♦ g ?, . 1-1. •••• 8 (5.2.« 

1 * ' It-* *e * 

Assone that (5,2,2) holds fes» a* tfoen 



•fa 5" 

2- «• 



^(n ♦ 1* f ) J £ sas CK* * PC * ' ^g ' ' ' » Tqgr 1 



• R A pC * (5.2.5) 



Q,E,D« 



JhSSSSm &ȣ*& 2f gCa* t) Is defined t& eqoat&oci (5,2.1}* then 
{^(n,f )} is a aanatone- taS'Q&s&ag aeqaenee (i»l» .-. 9 8$ ^«$: ) sad 
a^foo Xfotf) «^0*s and is a eolation to (5«i.7). 

&gof « We ©hos? ia3aeti^©ay that f ▼* fat f)} Is a nenotene Inoyeagiag 
aoqasnoe. Slaoe ^(0, t ) * !^$\GE**i} *• »*•• \Cl»t)fe ▼ 1 <0»+). 
Assam that v^n, t) > ^(a-ltt ) £©s> 1*4, ...» R are! te$* If 
*«.<«• t) ■ "J?" $\C£»t)} » thca ^(m-ltf) ^(n,^). Sappoa© that, 
fOp aosae fee ^1, ,,, 9 K^ B 



•125- 

a 

ItlSflf SiltO* 

n 

^(iifUt) £ <£(*> - ^(f-) *M ^f )vj(n» £(*»• (5.2.7) 

wo hove 
▼ 4 (aH.t) « v 4 (n»*>*. £ # (*> Cv>, £ 4 <*)> - ▼Jfo-l, #-«-»] 

£- 9 (5.2.8) 

proving the indaat&oa* B^ Letssa 5«2«i» tit® ssgiaeaes ^ v.C^, '^ Is 
tettas3sd» hano®* **■ ▼ (n*^) cadsts (i»i» •••« B). That Va& 3in&i i© 

a solution of (5.1*7) is sseo fcgr lotting n-*°° in (S.S.ia). $)•£•&• 

2&S8SK& &1U3 1k©r© la a tsnlqao boraaded ©olutioR to sqaat&on (5«1.7). 

J£ga£« It «aa stem in 'StwsoFea 5.2«3t that thos*© is at least on® 
fcefcsadted 3olation f j(t)# to C5.1-7). Asaoa© gMtntf') » («^(*P)» »#. ^C+}) 
is also a boisndsd ©olatioa* 1st 

A»,-,t) «#(r) - rtf(t> ♦ P £ # 4 (^K(^ 4 (r». (5.2,9) 

f ffi'E 
Assqbis (i«^) is £&3EsS« Wwsr© ®p© lots? easss* 

esse i. ^(f) • 3J^^\0&t^ «4w t (t) • £^\(£s+>} . 

Gas© 2. Fftp sssaa ae {i* • ••$ £V, v i^^ " s ?( v » 00 8 4 ') 8"* 



«42&* 



Gas© 3« For 90BW y« {&> ♦..* K 1 g «L<*f ) » S^Cw*<*> ,4-) and 



**>-S*V*«l 



s][fr.*>,t) • s£(w # <*>,*) ^ ▼ (*> - ^<f>* 0. (5-2.11) 

h. Sbs> 8©®8 ia3to©8 a a&& b bota^ng to £l» • ••« S^ 
i^ft) a S*(v,*>, t) aad v & (t) « S^(«,^,t). than 

sJ(T f «»t*) • sffo<* # t) a •ft) - ^(t) * s£<v t ~,+) - S*(w t ~ t f). 

(5.2.12) 
Lefc k index the waAwm of s£(«v»«9 f *f ) • s£(«y>* 9 f) | , 
J\(tr,o» f f') •sj<w f <» t t)| i |s*(* f <» t t)*S*teoP t *)| , assd 
^(▼t^i t) • sj(wt°*i t) • 3to*» in all of th® abov© ©as©@ # 



^CtJ-^Ct))^ 



3 jW>.s*W> 



^ V 2 # 4 (t) 



V&w-w^egf 



(5.2*13) 

5ta© \(^) asd ^(t) ay© both fesrara3e&» thard eedsts a m&&ep, M> 9 
soeh that 

IV t)o \(t>|-<«. « 

Repeated appHeatlon of (5«2.13) jf&e&Ss 

stnoo 0^ p< i, (5.2. ifc) lapULas v % ( t) » ^ & <t)* G.S»B. 



Let 



Wo nsrc? d©g&^© a ft®m& e& the ops©p of th© nth apppeateiai* v.(n*t)» 
^(▼•ntt)"35[ft)-^(f)*P 2 %<t-)^<^jCt». C5.2.15) 



J-l 



i«*, ...» M* 



Item SsZak &®fc the &SVQV of U& nth appsostaant vAn 9 ^) b© 



defined as 



©iCat t) ■ ^(f ) - Tjfott)* (5.2.X6) 



stoeipe v.(n , f') 19 d©an®d t$r (5«S«2) and ^ (f ) i« t&© taa&qo© beua&aa 
selutton of (S.i.?)« Let H sad p be dsfisted fc^ eqtaafcieffa (3*3*3) • *^«» 
«u(©»^) has the boassla 

g^faf)* pP *£ # (5*2*17) 

&£&£• E*p 'Sbsmm 5*2*2* fo (&• ffy is a wamtosm inopeaslsg aeqpss&e 
^&t& tfe© Holt v.(^)| hmm 9 e^Cn*^) £. («»0tl,2,..*)* The peaoisidep 
of the inequality (5*2*37) &e pso^sd ibgr intaetton* 

We ftet esta&i&ah that ?.(* )* ife" . Slaee \<SL» ^)^R *? A A 
fop all £j&t we hfiroe 



1 * p *«¥ 



Aosasa© v (n p t )£ f|^ . ?h€£a 9 alaee ©. > (j»l» «.., R)» 



3 



^(^i»t)< Mit Di ♦ A- «i5r 1 -■ A 



andt feg? irataotiora* 



\fcVO* pg . i«i» **•• » (5*2*20) 

HB0»1U2 t *«« 



, ®aae« v.(H') a a!^*^* 4 *^- ^ » pwrtog the aoaeptta. 
Slnoe V 4 CSL»r)>^ *>* ^ ^^ m ***** 



-128- 

\{o,*)± Jg . a^t ...» ■ (5*s«a> 

8E& (5*S*i?) holds ftes» jjoO. Assess© tbo sqpatlon is vaSid Sbs» &• 2ta 9 
©s»golDg as in $*» proof of ftssopsa 5.2. 3» ttapo is an ineieR fee £i • •« 

®«0*#) » |«u(«H-Jtt)| <!#(▼»«», 4") - £(*«»» *) 

* I « II* 2. 

Q.E.D* 



CtoaJto*? 5i r M ** ^Ca» V 7 ) is dafirasi fegr ©aaatte (5*2#i) 
2( t ) is tba taniqas feoawis& o©2»$ioa of ©qssatioa (5»i*7)» than 
{afts**^-* &(+> toai&ssay in *K 

tea£. Sifts® tb& ©stop bonad (5.2.1?) is l&dspealsnt of 4" » the 

oomltay foSlsmB tea Shsajrea 5*ZA* Q.S.D. 



jyiUfi Xf tfe© pio? distsibatioa taotion of £ is 
H(f l^)® &•* a faa&V of distentions osnttaoua in t whieh is stesed 
tmdor oonsooatiTO saqpling» then s(f')» the ta&qae fesaadad sotofion of 
C5.1.?)* is a osnttaeras teetloa of t * 

BSSSl* Siaos & is oGEtLaa0?33 in t, * 4 <0» 1) « 2*^\C2U ^ 
is a ossstajaao&s ftraoftlon of r • H0*sotof» P»A*^) is eantiaasus* Uhos* 
^ indostions v^Ca^f) is oanttaooas fop £pi» ...» Si and sp0»1,2,«».. 
Sine© {^Oa»f}]'-* ^(*f) tsg&gojsdSy £& t 9 \(lp) *s ooaUsRsewis 
<i«4* •*., H). Q.E.D. 



M Hffflrotiffil ftsaaaiyyeft. U|£* 

2n ©onsidsB&ng Hodel X it is isaediatel^ apparent that the wsa&ssm 
©speoted r«»rd «9L31 aot b© d®©2*aoseil«»»aad isay be lnere&6edp~if «© stop 
sanpliag in saee states sfaUe eesit&naiag t» sasple is others. Ear easrap&©t 
if the mrglaal prior distribution of a is loose* ttf&le the oarglml 
ps&or e&sts&batioa of the rsasix&ag (K~i) rose of v is tight* it say be 
profitable to saraple ©aly ^taj the systa is ia state i« $adal 12 ada&ts 
this «$&t&eaaX option* 

As in SeattoB 5«i» let \Cf) bo the suppom of the eapeeted tetal 
discounted reward over an iag&ait© petted ighen the ssrstcsi starts fre» the 
generalised state (&*th 2t is assumed that the deeisioa-asi&er ©an 
satsple the syst«% eon use ths eysto over a pasted of a traasit&oas* os* 
esa sake a terminal desfcsioa* If the systan is sanpled the eonsaeutivQ 
saapSAag rale Is operative and if the system is used the vstep saapaing 
rale is operative. $!hm 9 ts© shall assus® that the prior distribution 
Sanation of £ is H(<? lf)c $£ f a fa»i3y of dlstributloas elosed uader 
the s>»8tep sas^Siog rale. &*e©2pcra 2.3.fc implies that $4 is the s&ssd 
ssj&sRsta of a &&&V of dlatg&batioss «hi®h is elosed uader the 
mm&mW?Q a&spHag rale aod Usereitetpo* ^4- is also elosed under 
soaaee&tive soapling bgr Sfceorsci 2.3*3. %2s a if the dec&sioa-aatcer is in 
the state (i*^) ard looses either to s&sg&e or to use the process* the 
portion dlstrtbati*, rf f «3X to a ***** ot *. 

If it is dedHded to sojapl© shea ia state (i, *f ), the suppgems of the 
prior €&peot@d regard is given b^ egoatioa (5* &•**)• Suppose* oa the other 
hsad t it is decided to use the psotssss uader polia^ *L ^er a > 1 traas&tioas* 
the prabatilit^ that the systs© n&XL be observed ia state 3» uassndit&enaX 
t&th regard to the prior distribution of 2. » given that the systea starts 



in tto ganep&liBQd state (i» +1 arsd that n t&aeiettldsis as?© to b© oibBeswsd 
under the policy JE» is 

*j y *j — nog*'?.... 



th© (i» J)th element of the pr&os» <ssp©eted n»stsp transition probability 
matrix undo? poHegr JL* X«t oj (S£»P» *w dseaot© th© pstos> etsposted 
diseoiinted reward ©Quoad o^©? n transitions tinder th© policy „2£ isfo$n tfo© 
system starts **ot (l»t)« Bath flPj (S^ *) and C t£*P»+' ) «?© 
discussed in S©ett©n &«i. let T. An§£* f} demt© the parssates* of the 
posterior distribution of J tJhen th© ostein starts f)raa (i» t ) «m& is 
observed in state 5 after n transitions onder the policgr JC« Hi© prior 
espeotod reward under these conditions is 

H 

^(Xtft*) ♦ P £ ^CJE^t) C^(^(Ba£et)) - ^3 (5.3-8) 

EP2,3 ft ... 

*« J* 

and* if it is d®eid©d to as® th© syst©a e the sopeeasa of tao esoeoted 
cEsosanted rosard is 



>» — . N (5-3-3) 

Finally* if it is deoidod to siak© & tarainal decision vfam in the 
generalised state (i» ^P)» tfoo suprozaia of the ©xpeoted total oU&(0ar»t@3 
reward is 

^ \\<SC*+)\ • *•*• •••• » (5.3**) 



•13*. 

tfe atsalX anfcie&pat© 'Sbemm 5«3»U #&efc ©staH&ghos the os&atesj©© of 
m op&sal sacp&ng stgataggr lb? Mai XX* and ^sita m JS fi S to 

w«A».. to e ^ aa * aoa <5*3.3). 'Kissi tbo *©otoF tem©n&Ct) ■ C^Cf )» 
«••» *«( + )) 5S£2s^ o&i&sgy the MIss&ag gtaot&enal ©qpateeat 



V f) 






fc*i, ♦♦•♦ a (5.3.5) 

W© shall ass? os&s£d@*» sea© psropwties of Ho&ol XX* 1h©s@ prnpsrH©©* 
lb? tli© sosi pert, pas»a32©l thos© of FioM, I m& tho psroof 3 ay© qoito 
e&sllap to tboso of Sost&CR 5»g. Xa the fallowing tne ifeo&yssas It is 
sfxj6S25 that s® ©pt&sal saapBtog atee&tog^ ss&ats fos» 5&&@X EC m& that, i© 
as* opt&aal saiapUag etmt@^» & tassAaal dedsloa poiafc is s»ea^ad s&tfa 
pgpobBfc&3±fy o&e. We the© ta&aetaFai© tbo ©sdLsiesas© of a *»&«$»© bgOTsM 
so3s.tte to s^B&tta (5*3*5) asd oaeiaite a ©et&od of ssMseessi^© 
sppgos&mt£oa@» tog©fchss? t&th a fecm&ai m tho amr of tbo afth appssstesafc. 

Iftffiflsaa J«3ai ^i v.C f »$) b® tho Greeted total d&sossantod jpews^d 
In Modal XX obo, th* s^sioa gtefcs 1b th© gsnesaliaed stata fc,^) and the 



s^pUn.^^d^is^. 1* 



\(f) 



sop 



\y*v^ 



lPi, h M X (5.3*6) 



Ihen thoro is a. saspSing ets^teg^ d cD saek t&at 

V*> *«(*•**>• Mi (5.3.?) 

* * -re * 

.ggBG£ * &et 5 b® the set of sH poae&fo&e aaaipa&iag stgategi&i <!* 
In the adaptiw eantsol psoKLen* 3hen D* cz. D . Suppose d <s D • fnan 
d eitha* pseesaelbas a g&xssd p©!lieg?* £1* for ase or ewssy tysnaktiGn 
n > a 9 &? acts© integer a * ©* else mot* In <s&tfe©s» ease* all* possible 
st&ategp fbp Sfcdsa. H and D, <=. D.. Ttias* D. a 5^ She ffaaaindep of the 
proof is anaXsgoos to the pBOOf of taeos<«a 3* 1*1* Q*S»D« 



SsSeZ $n Ifedel n* if t&© tsoo state of nates g. is a 
poaltiw w&b&x end if *fe© tessisiaX ttanetioas V^C&t ) ®p© cont&naons in 
1" (iPi* ...» Hf JD£)» then* ^>th probafeiKt^ one* a tepatssaX deeis&an 
point is Feas&wad in an opt&aal sailing strategy. 

&Gfi£* Aeeana tfea?e is an opUssal aarapaing strategy in iftdLcgi a 
tsss&nal dee&sion is nevey ©ode* We shall show a essxtradlot&an* lot 
a tension point in tha easaple Matey be a point in tiras> at ti&tib. the state 
of f&e syot^a is ©ado taasa ta tha deoisieswaaker. Shea ffto asssasptlsn ia 
tfeat t&as?© are an infinite tssaba? of decision paints* tnera i@ at least 
one state* 1* ^t&^j is obsessed infinitely often and at least an® pe&is^* 
X* ®^ transition Internal* n* ^Moh a^e nssd infinite^ often in state i* 
£@oaa 2*3*7 and the positi^it§r of § feaga^ that evessr atata is ©beaded 
infinitely often «$.$& pssteOitg' one* Sinee a te£s&nal decision ia ns^ss* 
soda* Shape ia a finite integer* /<■ * ssaeh that* if n is a tenoi'S&oa 
internal ^hS.eh ia osed in tfae see?oliag strategy* then n 6/* . Ho? if net* 
an infinite teaaait&oa intevsl is osed at sooe stage* sfeioh is etpi^alent 
to a tesssinal decision* Itai* taara is a finite set of ©sdeyed pairs* 



-i3> 
(npiD* where ne |to? 8 ...,Mj and J£e£, which describe the decisions mad® 
at each decision point. Wo seay assume, without loss of generality , that 
all members of this finite set are used Infinitely often in the sampling 
strategy* sine® any pair, (n,J£), which is used only a finite number of 
times is eventually dominated. Hie conditions of Theorem ?*3.9 are 
satisfied, and, therefore, the aass of the posterior distribution of 
tends, with probability one, to concentrate at Q. as u, the number of 
decision points, goes to infinity. If H(fP\t*) is the distribution 
function which places the unit mass of probability on § , then H*-* t 
as v-j> oo and aqu atio n (5»3»5) beeeaes 



Vj( + ) ■ 



T* rrtZ.,,* $*<*** ♦ *" 1 3?tof> 



3, \v l( ^t^ 



3 



-o^ 



It was shown, during the proof of theorem 5»i »2, that if 



lol, ..., N {§.3, 



a contradiction results. Suppose, then, that 



i il ££ 



# 



max $« 
qp2, ...,yk 1 






i»i, ..., N (5«3* 



The argueent remains valid when the distribution which places the unit 
raass of probability on § is not a aaeaber of H • 



-13*»» 



We aay construct a Raw set of policies as follows. Let £ » (@^ 9 ,„„„ s j 
be a policy vector, «her© a. » (^p^) is a ehoiee of a transition interval, 
a e $1, ••••yuj , and a policy, ST^eS* If tha alternative & is selected 
in stat© i, then tha system goes to stat© J with probability P*? (STj.^ )* 



earning the expected reward q^ (2T k »P»^) - P o.. If S is 

of all possible alternatives a , then eqoatien ( 5*3*10 ) can bo written as 

i*i, ..*, N (5*3*U) 

Eqoaticm (5*3*il) has tha same forsal structure as oqaation (5«1»11) in 
tha proof of theorem 5*i*2« An argonent sia&lar to that leading to equation 
(5*1*13) shews the eontradietion 

\®*) < T^\ \lZ*t*)\ • *«*• •••• * (5*3*125 

thus* «ath probability one* a teyjt&aeO. decision point is reached. Q.&.D. 
L@t us not? consider the existence and nniejaeness of eolations to tha 
functional equation ( 5*3*5) • Let the sequence of vector Amotions, 

3Hn 9 f) , «here £<n, f) « (▼ 1 (n»+)» .... v (a, YO), *• defined ty the 
following eqo&tionss _ 



v.(a*i ? , t f) a c»x 



i**l, * . * t 
0*p<i 



H (5.3.13a) 



-135- 



iaas 



v^'-sM • 






^aa x*R*«i?3?k il^ij^ a^ 6 " T foV ****** 

functions v i (K 9 f) cteflned fcgr (5*3*i3) haws the bound 



R* + PC 



V"*>l*Htf 



i*i t •••» H C3»3« 



«£feg&£» ^® ps'oof is by indaeiien. Equation (5.2^) stasias that 



(5«3«t^) holds for n«aO, Assume it holds fbr n. Since 
02 



\\Hl) 



^w. 



£ R § an induction using equation (4-»l*9) shows that 

k (v) (^S,t)j *££r*. i«i ..., b (5.3**5) 



Thiss 9 f^osa (5«3»*3&) and the induction hypothesis, v© haw 



v.CiH-i, *f ) I £ ass 






e 

c*6 



■R,.f,ec _^ sax . S R * + g v c l-. 
ThF^ v*g t ...» mi \ "^/' y 






(5.3.16) 



Q.S.D. 



f T V* If &(n t t> is defined ty ©gtaatloa (5*3.i3)» then 
$▼*(»■ f)\ is a ta©FK>'i»n© increasing sequence (i»£» ...» N) 
^(n»f ) ©sdsts md is a solution of (5*3*5) • 






JBBBO&i The pssof that «fv (a©f )| ia EeeotoB© inapaaateg is indBg&TO< 

Oe9»3y» v«(l» t) ^ 7 (0»^) &» !■£» «*•» B acd fe^ . A©soa© that 

* a. 

^(a* * ) t. v & (i>io f )• If § t&e sos© J£«£ and see© Integaa* v* 

thee 

£4 






^ 0. (5.3.18) 

If ^(n»t) * 2Sc {\<£* W} #•■ *£<«►*#* )> ^C* f}» est, usteg 

C5«S,6)» ^© hstf©» in all ©&©©©» 

i i + t <% 

ffiaa, tfe©g=©fbs^» ^ ^,(n t t) €&&©ta» fisat the Hsdt eafcLsfS,©© C 5*3*5) 
n-^» * 

follows tgr lafctfcag n-»«© is (5*3*13©). Q.S.D. 

•Bb© jjeoaintRg tkooroGBs th@ papoofs of *a&ch parallel rosy 0^0882^ 
thas© of oojs»©apes3i?jg tho&FCBs in Saotioa 5»2* a?© stated uitfroat psoof • 

Tfieeflrw **£*& ?h©r© is a \aai^a© botsrsJed aalaUon to ©qaati©n C5«3«5) 

s^^ft^T ^"3.6 Lafc th© ejfsor of th© nth &£p*©s&sa&t* v. Ca»*f) s b© 
dsftaad a© 

o^t*) «* VjC-p) « Tgfat'P). (5.3*28) 

i 1 *!* •••ft? 

«hap©g(t) » taCt )• .... *„(+» i© the «&$& bsaadod s©3&U©n of 



©epat&on (5«3«5) aad sC»t^} is <3a&n®& ^ C5»3.13)» Shan euC&e'/') aas 

th© tOQ»38 

* o,<a»r )i P° |* • ***• ...» B (5.3*21) 

1 W 00,1*8,... 

tihoro R and r ss»© <Saflned }& (3.3*3)» 



?-M If g(% t) ia daEmed fcgr sqnat&on (5*3.13) and 
S&f) is ®& is&qaa bsanasd aofcrt&m of (5.3*5), tlwn^s(n,t fy-*> £C*f) 
tmiiosBV in V'. 

2&&8&SBI <5 -'*«9 3tf th© ps4os? dlatrttotAoa Stonottan 9? | i@ 
HC£lf )eHt ^ fna&V of cftotiAttttfcons eesx&sas&s in 'f *&io& Is o2bs©3 
radar v^tep sanj&ing, then sCf )© tha nafcgao kssasW s©2at£on of (5«3«5>* 

Is ©ontiaBsas In t • 

Tbo sxsaselosl solution of Modal XX IqqbItos osnsldssal&y mv® 
osapafcation than does tho e©2satfcoEJ of Jfcdsl X* $oi only does ©g&atfcon 
(5*3*13), tbo saee@oa3.ve sppBoa&B&tion sohesa© So? IfedeX XX, ismslw 
owalaat£©n of so*© tavern than dooa tho GOHpespending sahana for Hsdal X, 
bat too *©gBte©oni that $f- » ta© fkaiSy of pE&e* dists&bjtiaas ©f £ © 
bo tflossd uoflGE* M»sta© sashing issues that ^- is Hjo s&assd sstanalon 
of a fasiV of distj&to&t&oas oSossd ttndas* ooaeaoattora aaspaing* l&la taaans 
that t&e suBba® 1 of pamaatcee t&Adh oast bo handXad in asop&ting sa&at&ons 
to tbd&l H is larger than that regairad for saM&g Bfedal X. fha 
addlttael ooQpXealtr of Modal XX is s*©tsab3y tsorfc&BtjiX© enlg* £& too oasa 
of a ps%®% <3£,3ts&bstto tMsh is tight on ess© sons of £ and 3oos© on 
others and tins** teas oost of saving is &lgh« 

Wo not© that, fj&ilo t&o aim of *fed@l XX is to alto th© deoisiarMaBkar 



to sssp&e ca3y those states la ttt&oh there are transition psoba&Oi^ 
•specters tdth loose E&sginal ps&os* distplbations« he does mt bam f&H 
central ovss* the fUtee states In sMeh the spates aay foe observes* Fop 
esaae^a©, suppose it is desired to oasple the eastern only Tafcen it is in state 
i* Thes & sacking s&mtegsr oast be ehoosen *Mch trades of!? the expected 
disocontsd ease&Kge of the s^ste© agsiast the need £&r a fc&gfc ssgtaltftl&t? 
that the s^ateo enters state i at each decision pg&x&» 

We roaar?:* la this eonneetieB* that a deeis&an to oae the process 
%&«o is state i dees net neoessarilsr iapfy that the oonscmti*© sespair<g 
alternative is deataated at ffcrtare deeiste geAs&s tflben the sjrstea is 
found in state i* Saeh desinence sagr hoH sn&er a sasaple history vt&eh 
redaoes the aarglnal variances of $he alternative transition psobab&li$&©3 
in the ith state* feat there is certainly so reason to aspect tUs to be 
the ease under a seqpones of observations vft&ah increases the marginal 
variances of sosae ©f these trensit&ea pretabtti&es* 



5«& toOTtofra 

Xn Ibdols I sod ZJ it is necessary to evalaate egressions of the 
fern 

*» ^ 5S22S C« ^) 

tfbere V.CX^ • ) is the @:pmte& total d&assta&ted rewd earned ow en 
ioflcAte period ander the policy £ tsheo the eastern starts frsas the 
generalised state (i,^). Sine® 2 asy contain a large nostoer of policies* 
it is desirable to find ostheds of solving (5«^«i) Sea* the sssssSjsUing poSi^or 
j£ t&ich avoid a direct search over all elesfots of £• UtAa psofele® bos 
s^t been solved* bat sees pg>el&3lnary reaar&s eos&oarjsing the egppres&set&en 
of ^ $re offered in this section. It t&H be seen that these remarks 



are also appaioafcle to tb© psofota of se&Gotiag & policy *&ieh oasctaiaes 
the ®Eg*3©t©d gain* iC2£»^)# raliieh ^aas disaasssdl in Sest&aa &•**. 

&®* \(2L§£) b® the oonSittaal «Bpeeted total d2.soeaat©d regard aver 
ssi infinite period t3»3er po3i<^ «<£ «&@n the ^ste© starts fern state i 
and 2 » £ « the policy s&ioh Bssis&Qss this reward ean be fooad 
e£fiS.«&aat3j7 by ssseas of Howard's peSigsr Iteration al$»s4tfea £22]. Zt 
*»s seen in the proofs of Sheerec® 5*1.2 and 5.3.2 tbat© as tt» naafeer of 
observations in Model X or Hoclel ZZ ®ses to infinity* the aasa of the 
posterior probability distribution of § tsnds 9 vttfe probability one* to 
osasc&trato at the tme state of natare. ftms 9 if £ is the mm of 
the distribution of % » wa sea appwadaafco j£* by j£* ^fcepe J£ is 
defined by 

the error of this epproj&satioa gees ts ms® s&th probsb&li^r one as 
the ra&toe? of observation* of the presses goes to inftnlty. We osnatder 
bere a beuod on the error* 

skere £ is t&© naribor of distinct po2icS.es ia £• for a fiasd indsx i# 
let «^r be partitioned into a set of 3 mtaal3y sftal&sivs and ssteastiw> 
eitests* S-. t sada ®mt» if £«2*j» ^i@n 

Xf B((P|t) is the p&or distiib&steaoa ftat&oa of $ » 1st 

denote the prior probability that £ belongs to S . Sines t&o sets S. 
partition ^^ 



-IftO* 






teb \j£* t> b» tt* conditional eas>ecte®a 'waSa© of \<£;*if>t givwi 
that £35.4* 



re ** 



W© mio that <5*^«^) tapSiee 



+3 3 **> ^et 

&©t R(JL) ea& p(^) b® tfco sssliaaa end E&n&ssa tsesalt&on KBassda «beo th© 

1 R 

ead let ? ml n b© dafttaed fcy agaattei (3«3«3)« Or ©pattea (&*3«i>» 

▼«(£*£> » 2 ^ £ £ ^(sJ p£ »J (5^.10) 

cr«£ 



ife'^P* \C^f)^fp^T^- C5*.!i> 






— « v„ C£»t> s 2J1L. ** J C5^.ig) 

e 
p 1 



^a^JL Rw 3°U •*•• <?» the fbl34?a&ng inoqaaHt^ to valid, 






&g&s£* 3£nao the sets S pa^Utlen ^ . „ ws hm® 9 oaing (5.&»i£) 

EepeUen (5*^*13) S»s a peaypaagsBiesjt of (5*fe*&4)« Q.E.D. 

l€BBa "S.fr T 2 Fos» aw poHe^ ^.cS asd aiigr trades js {i, •»•» <J*j » 
the expected dagoeosfced rasayd uedss 4 the poHqy J£ k&® the upper tooad 

i-P« 4 C^ ) 



0s£< i 



j^gfig. We haw©» taslng equations (5«^»&i)# (5*^*B)» ®«d lassm 5.&.H, 



Q.S.D. 



*..*». 1 lot ® & (S^« t) be defined tey 



• •••• ' 

where £* is th© aa&fto&aing tors&gial 3$2isy defined fcy &gaat2.on (5*^.1). 
Than ®* &»''' ) &as th© beands 

o^p< 1 

^aattC* ^ equation (S«^&)» ^(S,**^)^ \l&»*) ®aet •jjt&t'*")^ 0« 

The tapper half of the inequality fallows free (5«&«&5)* Q.E.D« 

In order to bound e^te*'*') taaing equation (fMt«i8)e it is neoessaEy 
to ©valoato iW^K Tfe&e pubises has rot bee© eoapletely solved* ohidSly 
beosas© ther® id no sat&sfsetos*? aetfaod of fte&ng the te3Qa3Gs2.es of th® 
set S. .« Moreover* 8. . is rot neoessari^ a eesmooted set* tM^ ftsrther 
oaapilieatos th© probXaa. Tb® fitoheblSity SVaC'f) ®an be estimated t$? issiisg 
nassrleal or Kent© Carlo teehsfcqoos. 

If g<£» J) is th© gain of & Markov ©M,n idth alternatives vfom 
operated indefinite^ under th® policy j£« end if I(JL» ^) is th® 
corresponding aspeoted vala® when 2 k&® ^® distribution function 
n(£l1f)» than* as **® shall so® in th® nest s@®tion« it is often necessary 
to sroalust® expressions of th® fors 



-i4> 

i<£*.t>= jt^S.^ . (5.0.19) 

It £ is the mean of the distribution HC^jf' ) «e Eiay «&§h to approximate 
£T ^"3T £t defined b^ the expression 

there ere efficient algorithms for the solution of (S.^.SO) Ci6» 82]. Let 
the oreor of the appp&dirat&aa ST. be defined as 

A bound on ©(£- » *P) sisilar to that of equation (5«^*i8) is ©asS^r derived* 

Lot jy K ?J be the set of ail positive Ks8 generalised stoahasti© 

e 
matriees and let £ „ m be partitioned into <£ sets* S ., vhere t if £ eS. 9 

Xf S(£|t) is the ps&or distribution function of | , let 

be the prior probability that ff * S 4* *** 

5 p^r> j - ~ £e£ 

be the conditions! a^oeotat&on of g(l£»g ) given that & eS.» 3fcsn» by 

If R(2T) and r(£") are defined b^ £5*^*9> and R and r are defined bgr 
C3.3.3>o (**•**. i) lilies the inequalities 

r as. 5(2:»^)^^ 2e£ (S«&«26) 



^*» •••• J 

^ ijCSCj, r)P«( ^) * U-PjC *»*(£>• (5.4.29) 



Q«E«D« 



Sfffen.S $kr 60(7 go24ey ^jc£ sad any je {l» •••* i} » 

, $r (5^*^6)t C5-^S5)t w& Loam. 5>fc*4t 
SCCa V> a !«(£* «Pj(r ) ♦ £ \(£L> t)P fc ( t > 

Q.S.D. 

flftiKKM ^ t*f«fi the estop teat&osa ©tSC-» *P) <Mtoed fey ©goati©!* 
<5*^«2i) has tfe© tenodl 

3 J J <p«£ 

IfegQg. 2fe© tbea&era SbUoas dS,s«©e^ £*®a sgoatieas (5&A9) m& 



5.5 

have QQeiaaEitod in Section %6 on the» leak of a ©leas? e?lt$£Lon fop 



taate&ng decisions In on adaptive control Bedel with as discounting. Shoe© 
reoastes app3y as raell "to tancStseounted tenainaX control aodels* ©as 
©sitesta i*9 aeg? as® is to consider the olaos of sampling strategies tfcieh 
ffwnrtffltf,BQ the easooted stea^stat© gain of the process* then to choose gross 
this o&ass the strategy ?&&eh aas&raLsoe the expected reward ©vsr the 
transient period s&sioa precedes the tssslaal decision. This criterion ie 
aade precise in ths gressrit section* tahcre Hade&s OT and IV» tlie w** 
discounted anaScgaee ©if Models £ ©ad XX, aso introduced. So analysis of 
these models hoe been carried orat. 

5«5«* ttfi& HI^ In Model HZ it is assesed that the process vill 
be sampled ©3n3eeative3y tsotil a tenstnal decision point is reached* at 

tibion tlae a terminal policy is selected and the system is operated 
tsader this police over a finite terminal ©peratioa period. 

Ut T t < f jv) to th. SBja^m <rf «• aatootoS r~H <*» • p«tod 
t*soo© tess&nal operation phase lasts Jfcr v transitions tihen the system 
starts in state i and the ps&er dists&bst&on of £is H(f l+)e $*» a 
fSEOgr of dists&trattas closed aa&er issnsacsat&vc saving* 2f 9 «ha3 in 
state (i»^)o it is decided to sample at least ©nee mere* the mgrmm of 
the prior expected rsanrd is given fcy eqpat&ea (5*&*^) *a£th • !• 

Sine© ws shall be eoncesned taith large valnes of v^ so aescme that© 
tinea it is decided to eeese saapSing» a terminal policy «&31 be selected 
st&oh esasdeises the eteafy-state ©sin of the system gC2~» */')» fhsrefere* 
sfeen it is decided to cease sailing* the m$&mm of the expected ressssd 
over the tessdaal period is 



mas 



\ >§(£* + 3^. (5.3.1) 



vA t $%>) « saaz 



thus, yades* the sssosptions of Ms&el Xn t v & < *Pjv) mat saS&s^ the 
foBos&ag fQneU©naX_egaatiQas» __ 

i©i, ••«, n (5«5*2) 

t« * 

Sha affcomegits of Sections 5*& ®a& 5.2 ^eqaJUpoS thst the dUeeunt £&©te> 
P be less than isniV <&&$ t&epetfks?®* are rot d&reotly appSieab^® to Model 
HX« l^e es&stsRO© and properties of solutions to (5*5*2) are aatters 
£br ftrfesrs in^sttgation. 

Equation (5*5*%) yields ths esspreaaa of the «8peet«3 rowesd over a 
pesiod «&tb tess&nal phase of length v and also yields a decision lb? 
the ms*pmt t^easition istes?tfaX* this decision., vhitih is either the 
ssleet&on of so aXtepssativo to bo aaqpfad or a tessinal policgr* «&U bo 
ealled a >fflKttoft tfjftf?ftfitofi-° A v«»optiaal decision *&ie& is the sane for 
all v stif&oientSy largo tAll be ©ailed an qaffitefl, jJMMfifa- §laee t for 
large v» w ary ^optimal decision saadaiaes the ©snooted gain ssnd also 
tassissiaea the tots! reaard o^f the asapaing pc3*iod it is soon that an 
opUaal docision 9 as defined bars* satisfies the ©ritesion set forth at 
th© beginniag of the seet&on* Ttm QG&stsnea and nate© of optimal decisions 
have not yet been investigated. 

5*5*2 &g|g& ££& We now assise that the deeiston-aieker ©an ss@p&e 
o? rase the process at any ttoo psior to the tessainaX dse&sioa., iadspejadsatly 
of his past decisions* Lot v- ( *ff v) be the stspreKSB of the espeoted regard 



«,i*j$u 

ovw a p&Ao& t&th tseotaol operation phase of length v vhm the system 
starts in state 1 ts&th the ps&or distsibat&e& RC£|f )• It is assessed that 
H{£\D« $f » a f&aUy of dlstsibatiens elesod wsSte* v»>at©p saap3iag* 
Fbllo&d&g the asges&sn&s of Seetion 5*3 aod of the pffe^oaa paragraph* it. 
Is seen that v. ( 'f'fv) oast satisfy $h© fe&XewtRg tosUoml ospat&oa* 



v 4 cr$v) 



jj;^: ep2f3i 






1»&p ••♦t H ( 5*5*3) 

the resasfes made above oo^oepKtoig vN»ptiflsal decdaioiss aad opt&aal decisions 
appV as veil to KaSel XV* 

SJsr appsoeoz&ng optiaal decision for Hi© tm&scora&ted ters&nal ©oretsol 
aadals fessr aesss of v»optiaal decisions w© hssm eoghaai&ed the £aei that 
on tandisGoasited in£taite barton rasdol is an appsosdraat&on to a system 
^dch 5PGH3 Sbr a 3oag* bat &nit©* posted* We om ep&ISy wall view the 
tgt&eootmtea aoiol as an app?03statioea to a aystsa v&th a disoosst factor 
vesy doc© to osi^. Shos» another approach to the solution of the 
tBrc&eooQHted teaimX oaatrol proKlem is to let p ->i is l&dals I es& XI. 
Hh© ealstessoa asad properties of sol&tione obtained in this sonner and their 
relation to sofotieeas obtained via ^»optlaal decision© have aat yet been 
investigated* 



Ik 13QEJ7 psoossaes tshieh can be iaG&<s22e& as a Hat&ev eha&n ?&th 
alternatives their© is & eost associated «ith efeanging alternatives is ©soft 
state* Sooh a sot-ap cost eou3d inelado» for saBnole* the east of starting 
the operation of alternative k and of shutting doan alternative .J *fcon 
4a stai© i. Sofa®® eests saa easily be iatsoduosd into Hsdela X and XX| 
we iHastmte haw this Is dan© in i'Ssdel X for a Skated met* 8, wMea is 
incurred for eaeh ehang© of alternative aade* Ihe sethod Is easily 
generalised to the ease in t&Ldh 3 is a £tonotien of the state in tihica 
the change is made and of the alternatives involved in the ahange» and is 
also appXieabl© to the adaptive esffxteel ta&S&L of Chapter 3» 

^t JSC • ( Q~«9 •••• *"L) denote the policy under t£iah the system 
is earreat^ operating* where ^H is the inte of the alterative in use 
in the ita state ( &"*. »J t •*•» 8L )• We new define the generalised state of 
the system as Ci» t*&}* where i is the physios! state of the s&staa 
(i^i» ..•, K)» t indaaee t&e peAor distatbotioa of f[ (f e ^)» and 
JSC is the policy eurrfotfy in use t£,c£)» *#& \( ^»JE) k® ^® eaprsBtw 
of the scooted total discounted reward over an infinite period if the 
system starts in the generaHaed state, (i» ^*ZJ» The prior distribution 
function of % is aosusod to belong to a fas&3^ of distributions closed 
under consecutive sswpling* 

If the systea is in state (1» ^tZJ &s& it is desided to sample 
alternative k» the supreaoa of the essneoted regard is 

(^-DS ♦ e*<t> - PS£<*> ♦ P £ ^(t^C^Wt^W) «•*•*> 



pl 



e 9 

«&er©Xj(k) » ( ^,t •••# <*".,)* deflaed tar 



>i<&9* 






a p i 



«» k* 



a » 1 



(5.6.2) 



is the no? poller veatet? «herc altss^sata^© k Is dbasm in state is 8, 



-5c. 



"*&* 



ij 



is tfc® Eroosekes* delta, lb© igaantitias \^) *»* ^C^) eg© defined te? 

©tpatioas (3.1.^) end (5»t«3)» reepa8tlTO3y. 

IS it is decided to eaas© ©sailing «hm the qystoa is in state 

(i«^»£J aad operate iadeflsfltefytiate the policy J£ e ® ( (r%» ...» a-* )» 

* S 

tfe© espeated regard is 

H 

S S <««*%. -i> ♦ ?.Cs:St) t (5.6.3) 

tetoe \ (&.*» *f) is the pplo? ®sp®etfid dlseeanted pesass'd for ©pupating tbe 
s^stna ov©r an infinite pes&od msta* the polie^ JL% starting £bbss stat® i 9 
vhsn B(2 It) is the pslop distsAfcsfcion oft 

Stasia S-bdeX I «&tb a Usead setwsp eoat» \Wi£j Esast satisfy the 

f oUssAng fanetional oq&atiem* 



^( tt£) a aax 






JLeE 



N 

S £ 



'^•D^v 4 (2:%t)i 






cwtsr 6 

SXSTRIBtnXON T8EDRT 

In this chapter we introduce sob® probability sass functions ami 
dsnsity functions which will be required for the next chapter, sshcr® v* 
do the prior*»posterlcr and preposterior analysis of a Markov chain observed 
under the consecutive stalling role* The Whittle, Whittle-!, and 
Whittle-? probability mass functions are defined in Section 6*1 and fernnlas 
for their maaents are derived* The multivariate beta density function Is 
considered in Section 6.2 and its used to &®£im the tsatriz beta density 
Amotion in Section 6*3» Some extensions ©f the ssatrix beta distribution 
ore considered in Section 6«<t and the chapter concludes with a discussion 
of the betsM'/h&ttle probability mss function* 

The isult&vari&te beta daasitv function, as defined by equation (6*2*2) 
below,) was introduced bgr Kauldon C30] in 1959? Kesiiaann [31] has studied 
the sain properties of this distribution* the satrix beta distribution 
m& used fcy Silver C3$]» but not under that ness©. Ihe Whittle end 
beta-Whittle distributions are original with the present work* 

Let s^ © (3^, x , • •*, * ) be the sequence of consecutive observations 
of the states of a Hartov chain over a period of n transitions, where 
SL3U ia the state of the s^stsa prior to the first transition* The 
range set of the random variables 2. is the set of integers which index 
the states of the chain, \%, • ••» Hj » It is assuaed that the transitions 
are governed by a Known 8 a R stochastic natrix, P « Cp*J» ®«d that the 



•151- 

dlsts&btttion of the initial state, n t is a knyen stochastic row vector, 

P s CEy • ••» P )• 

Given a soapl© outcome*, x , «e define the statistic f as the sx&aber 

•TJ ij 

of indices se jo t 1, .*., n-lj 9 such that *- * i &«d x « e J {i,#*l» • •*» K). 

In other words, f. . la the amber of ©ecttranees of a transition froa state 
ij 

1 to state j in the scrapie a^. Let P a Cf* .3, snNxN aatrix, be the 

tyafwAtlan ^jffi^ of the saaple. Prior to the observation of x * F and 

•n s 



/>/ 



u are random quantities tshose Joint distribution is stadied in this section* 

Let N 

i. ^i &$ 

and 

8 
f « ■ £ f 4 . J»l, ..., N (6.1.2) 
•J i«i *J 

be the sow and ©otam seas of P. With the oesseptlon of the is&tial and 

a 

final tanslttas, every transition into state i in the saisple s mist be 

•n 

followed by a transition out of state 1. Therefore, the elements of P 



5S 



are constrained fey the eejiatlons 



f. •' i * 6 i " 6 * • i°l» •••• » (6" .1.3) 
where u e x ^ in the Initial state and v » x is the final state. 

Ifoe foUoning lease sheas that, given a transition count g and an 
initial state u 9 the final state of the sample is ur&m<H& determined. 
A similar derivation shows that, given F and v, u is fu&qoely determined. 
Learas 6.1*5, below, shows that P does not necessarily eniqael^ detewdn© 

e 
both u and v. 

i£@B& &*a&l I«t uc {l, •••» h\ be fixed. If JJ is an N x H ssatris 

of noa»negatlve integers sshiefe satisfies the equations 

f l. " f -i * 6 iu* 6 iv lwif *••• N te****) 



-152- 

far son© integer ve {i» ...» Hj t then v is the only positiv© integer for 
which (6«i«£») is trao. 

&£&£• She proof is lay eeniradiotion. Assume that ? and w both 
satisfy (6.1.4)* If u f* v, then 



and 9 alast 



f *» f 8 g «*a « 



f - f o ft • 8 * -6 

v. «v 



to TV w 

#&eh ioplies that v » v. If tt » ▼ then 



and 



i. *i la iu 



f. - f . B < • 6 • i«i* ...» R 

&• «i itt iw 



UnSf 

6 « 6, » i»i# ...» N 

la it? 

end w « u « v. Q.E.0. 

Let X & {o 9 % 9 % 9 ***\ denote the set of all non-»negative integers. 
For tissd qg {% 9 ..., k\ • to \i 9 ...» RJ[ s ne jt 9 2t3*«..\ * aiad P«^ * 
define the fbllcsalng set of HiN ©ats&ees, F » C^l* 

I s i v* l Ji ^ • * **■ • f •* " •*» • 8 ^ v° lf v* (u - - i 

(6.1.5) 

let 

N 

<£. (m f n»P) « U 4> (QtV,a,P). tFi* ..., B (6.S.6) 

B B ** h ■ n*i,v- 

It is clear that ^C^»n»P) is the sot of all possible transition counts 
| sfeiofe ©an arise ffcora a sample of n oonseoxsti^e transitions in a Harkov 






-I5> 

chain with transition matrix P and initial state m. 



6.1.1 Ihfi Ifoittla gfc^dil?ltito>' She S » B raata saatris Y* 0£«3 

9 13 

tAth range set ^(t^n*?) la said to have the tMt&le diats&bation vitli 

paraaste? (u,n,P) If fhm the Joint probability oaes Amotion 

» a M 

f?rt * 1«1 *i. 8 8 8 f 



i& *& V 

b o» othera&se (6.I.?) 

where npl, •••» N 



The lades v is the unique solofc&oa of the equations 

*i. * f A " <U " *lv **• "•• B 

e e • 

and P is the (v a)th eofaoto? of the HsN raatrlx F * Cf, .] defines 

tor 

f* » a 



15 ^ *i. f 4 >0 <6,1.8a) 



L. f. «0 (6.1. 



Sinoe* in (6.1.?), there may be sosao p . • • 0# wa use the ooswgntiea « 1. 
We have called the sass Amotion (6.1.?) the fflflfrfffie dtfl&dJbaSlgn 



because Whittle [41] hbs the firet to show that 

U jj TT TT & J 

II JTfc/U » *"* i t * 1 3 . (6.1.9) 

N «s ™ 5ywj 4." 



I 



Mhittla's des&vation of (6.1.9), and sabsegaesat psoefs of t8&a relation 

t^ Dawson and Good [15? and l& Gcodraan [21] issre obtained uades? the 



t. : 



resttiottoft t L > (&»!» ...» K). a&131ngalssr [iO] e in & paH4©aW3y 
elog&ftt pgoof of (6. 1.9) i did mt pogcdr© this ipeatyiotta. 



for the ooaao* totSadgos, sisd ooraariaBoea of &i© olaejeets of F* Boffef* 

n 

ppmmt&ng those rasolt0» be^srop* it is aeeeasagy to ssssffisvise ©ertaift 
foots f^eo the thsesy of sats&oea. 

&s>t P bo m N s S ssati&x sitfa Gdgearcalsea \«» ...» ^ t aasoresa to bo 
diatinet. Let g(s) b@ aa asfc&tafssy aoala? po^tfrnfflasX* a^ * a^x ♦ ••• * al*"* 
and lot g(P) fos th© oaxYeapagndlftg natrix poS^acealal, aJJ + a-g + ... * a Jg • 
Sylvester's Shoo*«a states t2sat 

tie) 

«be*e the H z K natsAees A are defined V t&a esgwessiea 

Haose matsioas hava the f&UtaM&Qg properties* 

A Ci) A (J) w ^ (6.U2«) 

IB SS 8 

U (i) 3 2 a a <1)# 1»1» ..., H (6.1.12b) 

» SB 

a ^smla dUallar to (6«1*!0}» called tfce flapfflnrerti Ijajga fig 3sflaastgg»a 
3&§BBa% Sis GwelXabjL© in the ease of s^eatod a&ganraataoB* 

If P is att Qsv^io sto^&asti© mtsl^—i.o., if P is t&a teeRattioa 

El S3 

nots&s of a single nsra-pes&odie Haricot ©hair^thcsi @a&ot2y ©m e&gsav&Sae 
hm the ^jb2k© wAty and all ©t&er aigeroja2ass have iae»&a2»8 lees 
Ho shall edopt the Q&ffi&Mon that A- is the unit roots 



•15> 

\. «i (6.1.13a) 

s 

l&J <1. 1p8, ..., H (6.1. 

Tbm the satsix A w is in B x II aats&ss eaah *ssr of iMeh is the eta%? 

o 

stats sector 2JT* 8 C 7r i» •••» "K) defined ty the yelafc&on J£ © „gg. 

£haO£S& &JLa& If tbe I a R reads® isatvix fhas the Whittle 

e&stfflbQtto «&th g®p&mfos& (u»®*P) 9 then the espeeted toCIs© of f. _ is 

» 13 



8 CE J ■ * 44 (*#n) * ~S !^P 44 » i t J»l»... i H (6.1 lft) 

« ij *0 * *J nel,8,3, v# 

Vpl§..*fJi 

itfseye p^ ' Is the (iKtDtfaeaatoBNat of jr. Xf 9 ftcpthessaye, P is esgedie 

til • • 

and the e&g&reajtaes, &,» ...» X » of P are dlstisot, then the sheeted 
veiteaa of f. has the spectral fsprese&tation 

\a*p*0 ■ P ti tn-rn ♦ s jjjjk. a*^ 3, la*!, ... t s (6.1.15) 



*h*s* A W ■ Caf?l is defined fcgr (6.1.11). 



• •••» 



,* Let f f (i3*c) be the flushes* of tjpsasitioes tsm i to j in a 

gaap&s of n transitions tihidi has Initial state tu Fslor to t&e 

observation of the ®m$&» f. .(u»n) is a s?aate vaxlabis. Zf the systera 

ij 

atefca in state u and the ftot taasitioB is to state k 9 thee f fa»a) 
satisfies the oqtmt&ons 

^ .(u»n) « c lf .«L 4 ♦ ? J&t «*D» waSt3»... (6«l£a) 



^j(U t l) » ty^M* ***» •••• H ^.1« 

Tbmi f 4 (u»n) satisfies the sqpa&oas 



N 

\jte»n) * P^^ ♦ £ P,*?*^** 1 * «^»3»— (6.1.1?a) 
? i:j (u,i) * P^^ (6.1.17b) 

Wo shall prove indoetiwl^ that 

3L(*i»>« £ P^,* 1,4=1, .... 8 (6.1.18) 

Sine* S „«P* ^fi^P *» (6.1.17) Is satisfied by (6.1.18) fop n » 1. Assase 

(6.1.13) holds fb? n. Umb, using (6.1.17a), 

H n-1 ( ) 

* (u,tttl) « fi l-6 ♦ £ p £ p p 

»P • ♦ *2 pir f) p 

a (k) 

proving the indteat&on. 

If all the eigenvalues of g ape distiaet, Silvester's fheorea yields 

Vu ® S \ k a ( !f*. fc«0,i,2,... (6.1.20) 

^a ms< s Ui 



If, teihenaoye, P is ergodle and X, • 1 la the only eigenvalue of unit 

s&dolus, equation (6.1.18) eon be written 

n»i N k * % 
t t*#) • E S a" a*?> p 

m P«* &>*« * s iliSL a (SS> }• («•!•») 



I'tJ !* • ••• K 

83^1,2,3, ... 
Q.S.D. tS»l, ...» H 



TffiMFreffi 1 6rV? If Hie II * W wuadso aatvlx Yh&a the Whittle 
d&sUibatioa t&th paraaetes* (u*a»P)» thee** fep atP»y*6 » ^t •••# W» the 



o&mrianc© between f^ and f . la» 

If P is ergsdie and the elg^nvalnes «>f P* X 49 •••• A te » ere all dlatLiset 9 

ea Si X 18 

the eavarianee of f , 



afi 



■"tofA^'-.-e'— -*> 






Y t» 



bp»2 3»2 



♦ T> 



•ir ft ir * s I±2sgL CTTCaW ♦ a« ) 

a v sag (lriL.\8 * ■* nr 









•«o [ 



i- 



i - X, 



W 






^•*J 



(6.1 .23) 



t&ea n> I. 



,£B88£« If ^ (a f n) is the waaibey of transitions £*«b state a to 
ftp 

state p In a ssn^lo of & tmnsittea vtom the chain starts in state &» 
equation (6*1.16) audi the relation *y+* M & 6 * «\* ^^ ****** ** m ® 



-i5&» 



ftpst transition Is t*m u to k» 



1^p(tt»n) *!(*•») * 



VpW^ * Wop 



* & 



?»n»l) ♦ J^Ck»i!^i)?J s (^s>"l)* 



1 9 •••© 



(6«i«2ta) 



* WiaWW 



lot 



v4 (6«1.8ft>) 

Is**** •••9 K 



^ye^ * S [ ^5 <U,R) V Ctt » n) 3. 



(6*1.25) 



Than, ttsing ©qaat&es (6«i*2fe)» it £© s@sn that cr a a v *tet n ) 68t8.sfSL®3 tto 



r aPv5 (U,n) - 



S 



V»w** ♦ S^V 8 *"** * N-ftsW** 11 '** * JL ** ^W^ 1 *^ 






VVaa%* 



ss»i 



(6«i.3Sb) 



We ohaSl show S&at 

s 



V»V-> + 3l & t Wt W W> * «^ V<*<«*»' 



t»4 t ••*, 8 (6.1.S7&} 

sp2 9 3»..« 



■ VA M ' 



®3i # • •»; 



>•£•£?&) 



to tMtih cas© G$2&t&g& (6«i*2S) tffo£te3« 



Xt is clear froa (6.1. i?b) that (6.i*2?b) ©qaals (6«i.2ob)« $h® ©as® 
w»§»3» • •• «iH be proven ^ iixitaattoiu Fop ep2» it is easily verified that 
(6#i.£6a) is oqaal to the expression in (6.1.2?a). Assam® (6.1. 2?a) 
satisfies (6.i.?6a) for n* Th©n 9 using (6«i«l*0, 

+ vp« 2 p ~ 1)p «* * 2! Cp £" B)p <*"v M) * cv<* (, -» 

C6.i.?8JL 
pyo^ing tfo® induction* 

If P is ergodie and has distinot eigenvalues! equations (6+tA$) and 

<6.i«8Q) fbr th® spectral representations of f..(upn) and p - oan be used 

in (6»1.2?.a) to obtain, for iv"2»3»»*« 

o{3 Y 6 ap a ^ ira)V ~ua ay P® Y» y m % i«fc " 

(6*1. 

%t!tlp2^iRg out equation (6*1 « 29) and using the relations 



E *\, B ^., n » *"«t • ••* N (6.1.33) 



k«l 



-160- 

**•! k n-l-nX* ♦ \« 
£ (t»0 o 3 i 9 j»2 f ... t r (6.1.3a) 

k*l J 

l-\ j 

£ * ( ***>(4<! ) ° T- 2 — - 3 1 • (6.1.52) 

J»G)P2» *••» N 

and 

T x in*V*) iUk ) B _© ____£ naS, „ #f M (6.1.33) 

fed « m i-fc 

at 

©goation (6.1.23) is obtained. Q.'&.O. 

In theorems 6.1.2 and 6.1.3 to© spaeiraX representations provide an 
efficient, raethcd of oosspsitirsg the aeens, varianoss* and ©©variances of 
eXeoents of P. lids aethod is ottrtiojlarlar laseM. as the paraiastey n 
becomes large ©ad, isi faet» leads to relatively simple approaifaations for 
f. .(u»n) and eov Cf ' »f ' ] tfeaa n is safSidentSy large* as is shown 
in the foXloraing corollary. 

Samto 8At.ft If the N 35 K random matrix F has the Whittle 

s 

distribution tilth parameter (o^P)* t&era P is ©rgodie and has distinet 

AS 

eigeovalues* th«n t 2®? large n» the ©aoeeted value of f and the odvarianoe 
betaeen f B and ST. are gtess % the fblloidng asymptotic expressions 8 

?..(ts»n) ^ p. 4 Cmn «* r JSL 1» i*!®!* ••»» R (&•*• 

1$ *J i X9*2 1~\ 



-161- 
» -Co) m _ f A (ra) Cm). _ f (®) C®k 

p ^W?e £ ££ "* WN* j^ (Wk )8 



^ai 



m2 Jb2 (t-ViX*-*^) 



<*»P#Y»6 * *• •••• 



£BSfi£* Eq&ations (6.1. 3k) assd (6«1«35) a?© obtained £$r letting a 
beseaa largo In C6«i»&5) s^ (6,1.23), dropping tarsia of order 
(b*sS b ...» R) t end noting that 



■ 



n*?~ «»5 8 0« W"2» ... t B (6.i«&) 



Q.E.Di 



6.1.3 JEfrft &fci$£%fl!»l OlatEib^^,g^ f , Lot u b@ a rasata integer tsith 
reago sot {i» ...» S3 end 1st T a K J^aa^sH raadoa eatrlx tdth 
range set <£., <ia»n*P)» the ordered pair («§£) is said to have the tihittl&»& 
distribution «!th parameter Cp»n#g> if (tT»F) has the Joint probability &as® 



SB a* E* 

a 0, otherwise C6.&.3?) 

«here p a (p., ...» p ) is a stsdiastie raw Traste, rEBi»2»3»«««» and 



-162V 
N 
Sine® p >. and Z p » 1, it is clear that 

U «•»! U 

end, using equation (6.1.9) # 



2. 4»>(^ £| p>sl> |> . i. (6.1.33) 



£ 



It is readily seen that, if (ts»£) has th© ^h&ttle-l distribution vita 

s 

parameter (p»n»P), the marginal distribution ©fu is 

P [tt\ Pi • P^» W"U •••• N 

a 0. otherwise (6.1*39) 

AS 

th® marginal distribution of F is considered in the regaining paragraphs 
of this section* 



6.1.* J&g WfrUttfr* QiiHtrA^IAffn. I^t ? be an 8 x B random matrix 



P 

a 

with range set 



<P J\n»P) o ^(u.n.P). n»i,2,3 t ... (6.1.40} 

The Whittle-2 distribution with paraaeter (p»n 9 P) is defined as the 
s&rgin&l distribution of F when (ta»1$ has the Whittle-! distribution tAth 



ss e 



parameter (p 9 n»?)i 



® o, otherwise (6.1 .41) 

inhere p is an N-dimenslotial stochastic row vector* n«l 9 2 9 ... 9 and PsjaT. 

It is clear from th® definition (6.1.*»1) and the fact that 
f m ^ U »1 1 £» u »|? * 9 a prababilitgr «as8 function that 

(N) 



•i6> 



I 



•i 



f«4> K <H 9 P) 



*W2 CF|P»^P) s *■ 



(8) 

Before deriving an expSAelt fbranla Ifcr *jj '(P j p 9 n 9 £) 9 a prellainary 

Xoaaa Is required* To this ©nd lot 4> (n f P) be partitioned into two 
sets, £ M <n»P> «w* 4* &<*•£>• <3<9ttn « l •* 



^P.^Cn*?) is the set of fill iswjsittas counts tshioh start and end in the 

MX isi 

sarae state and **«*(»»£) * 8 ^ e s@ ^ ®* ^H other transition ©grants in 

&1&5 t«t tha sets ^jC^P) *«* ^ B g<n»£) *>® defined ly 
equations (6.1.^2) and (S.i.*»3). If S^ri^S) there are essaeily 13 pairs 
of Integers* (s^y) «■ Cu 9 u) 9 w*l 9 • •»» H 9 ahleh satisfy the equations 

i. .1 is iy 



Xf 9 on the other hand* Fs^L.JntP) thor© ia a unique so2saU©n 9 (s 9 y) ■ (u 
where a £ v 9 to (6#i#*&). 

f« - £ * * 9 isl 9 ••• 9 R 

and (6.1 .44) bsooaes 

i* iy 
these equations are satisfied by x » y » u (*s»i 9 ..♦. 8) and are not 

satisfied by any pair (x»y) suoh that s # y« 



-160- 
It Fe<£ (n,P) there is, ty the definition of ^^J»»P)» at least 
©a® solution, (u»v), ts (6.1.J&) uifch u # v. Assume (uj ? } also 
satisfies (6.1.t&). If u » u, hmm 6.1.1 isnlies v • ▼« Assraae 
m* jt u. Then, if v* ^ v, substitution of (u»v) in (6.1.^) fields, for 
i » v» 



*L - f „ - -1 



* _* 



t&ile (u, v ) substituted into (6.%.t&) gives 

*- • t « * * v » 

a contradiction. If? s v, then (u»v) substituted into (6.1.^) tsith 



i » u iapHoa 



f - f M » i 

u. »u 



and, since v » tr ^ u, (tt , ▼) substituted into (6.1.£&) ^piel&s 

u. .u on • 
which contradicts the assumption u p u. thus, (tt,?)o (u*v) • Q.S.D. 

Thaoram 4JU6 I*et ? be an N jc 8 random matrix *tth range set 
^a (»•?) t£&ch has tbe WMLti&&»£ distribution with parameter <p,n,P). 
Then the probability mass function of F is given bgr 

IS 

IS 

$P<£| P»«tP> ■ < 2 p«F 44 ) - ■■"■ II R fc«*J t Fc<£ <®* p > 

n n f lf * ^ 

i®l 3*1 *J 
M 

& A r « 

a 0, otherwise (6*i.^5) 



•165- 

tfhere F is the (x,y)th oofaotor of the satrls F d©£ined tagr 

*v — 

(6.1.8) and, vnen P€^ > M _<n»P)t (u,v) Is the unique eolation to equation 
(6.1.**). 

jaasfc. %■ definition, <£ „?(n»P> and <£j!<n,P) ar© mutually 

esBlnslve sets and together eafeaust the rang* sot* <£>* (n»P). If 

n » 

Fe ^(ntP) then, ly S^aaa 6.1.5, Fe<£ t (i,n,P), i«l, ...» N and 

e «a o ens 

(N) 
^ (? |i»n,P) > 0, i®i, ...» R 

%&lefe yields the first Hoe of egaation (6.1.45). If ge<£ * (n,P), 

6.1.5 IspXies there is ssaetlsr on© value of a In the range {l, ...» n! 

such that 

f* H) <r|ii t ii i P) >0, 
we 1 a 

tshleh yields t&e sesend line of (6.1.45). Q.E.D. 



6*1«5 &393B&a fill JI& IMS&S&sE FfliftM^irtfoffl- &* $&** paragraph we 
derive formulas for the expected value of fT and for the ©ovarian©© 
between 3^ and ?* 'whcei T has the lftdtt2©»£ ^.strlbut&on *&$h parameter 
(p,n*P)» When P is ©rgodi© and p » J£, the steady state distribution 

o St £3 a, 

corresponding to P, particularly simple fonmlas result. Related movants 
have boon derived top other authors. Anderson and Goodman [1]» asamiag 
that <sany ftar&sv chains which are governed ta^ the same transition matrix 
are simultaneously observed, find expressions for the means, variane-sa, 
and ©©variances of eT (t), the sassier of systems making a transition from 
state i to state $ on the fc»ta transition (i,j»i, ...» N). Good [203 kaa 
derived formulas for the mean vector and var&anee^covariano© matrix of the 



. « 



»I66» 



» s r j isi e ♦«., n 

3*1 *J 



is the naabos? of tto&s the systo i» observed to b® in 

th© initial ©tat©* bat mt th© fimal state) in a sas^&e of a c©n8®®ati&® 
tspansitions, assya&ng that the distj?&&at&©r» of th® initial staft© is J^» 
th® steady stat® distribution eom»sp©nding to ?• <Xss» ©riaat&ons (6.1.55) 

S3 

and (6.1.5*) * «h€R saniied ®\r©* 3 issd o?©? p and £» ^©apootiv®!^* s»«dtt0® 
to Good's fbscailas* 



jULsg Lot the N x R randcxa aatirts g have the WM.U1©»S 

distribution i&th paaaaeter (?»n»£)« Then the expected <sala© of f 

• i3 



is 



t*i n 



.(fe> 



If P is eygodio and has distinct eigenvalues* X«* ...» X , than S It 2 

has th® spectral representation 

~ R S.X a H , m * 

SO&J ■» B^. + p^ Z , „g.„ 2 pajf't lO-l* ... V R (6.i.&?) 

*S i^ ij QSS2 i~& Wat « ^ 

n 



rahere th® RxN satKioes a » [a. . ] a?® defined by CS.JUit). 
£goa£« Sine® 

SC*«J ■ s P/ 14 <«»«>t (6.1.^8) 

aqoatlon (6.I.&5) follows iaoedl&tely fgoa scpat&en (6«i»3&)» When P is 

®?godio idth distinct eigeKf^alBea* C6.1.4?) follows ftoa (6«i»&5)* Q»£<H* 

ih^prem fi^ft ^ Let th® NiS rente* ssatrix F have th® Whifcile»2 
distribution ^Ath pasoEst®? (p»ntjp« then* Sbr GtP»Y»6 » i« «••» R» th® 

©ovarian©® batseen t~ n and $ „ is 



*67- 



d»1 



«**VV - «^< VPT«^3) * £ Cp^O.k) 



to*i 









» EC? JK« •«.-<?' .1). 

oP ayp6 y* 



w*i 



(6*l*t$a) 



(6.t,4j9b) 



X? Pis 



o «Lth dlsttnot e&gfflsws3a«s t X > •«.» V - and If 

A H 



b (l3) n p£ m} <* (bJ a J ..., b^ a) ) f 8te2, .♦., H (6.1. 



A*~" is detflRfld %jr etpat&oa (6.1. 11), then (6.l.**9a) has the spestml 

TCpMMBtttttoil 






,n 



K *■** (a) (») 






H 






pv»5. SI 

♦TrJb< B W a) >]* £ BL 



V * oft 



0»2 <i-x ) 



r- - 



» w *j ^in»«i nr"i tfH 

op2 #1 i-X, 



■X 



n»l 






. /•»»! . n»iv 
x 4 (x • X. ) 

W IS V 



!* 



**-* 



5 



(6*1*51) 



jggSQg* Since 



■«w- J^nW" 0, 



(6.1*52) 



-163- 
squatton (6*1.49) £©H©ms issBedJLat^ taxa (6.1*2?) • % tsa&ng (6*1.52) 
together ^Lth (6.1*30) aai the spetoteai ffaprosssifcatiG&a (6.1.29) e e^aatisn 
(6.1. 51) is obtained. Q.E.D* 

tofliUffflT &&L& *&& ^k J» » * B yarata aatsfos i&tah had tha 
Whittles e&stsifcQtlon «ith pds^aaetep (j£*o*?)» tfass»© P Is effggsd&s ess! 
J£ Is the ateaafcr stats dists&foatta asroespen&ng to P. than 

and # Iter a»P»v»6» 1» • ••» B 9 

«*• r vo ] - 

xpS t 3 v *«« (6.1*f$a> 
If P has dlstfijset e&geovalaea* agnation (6*i*5«&&) lias the epsetaral 



w C V V 






J&g@£« Sins© 



n«2,3 9 .#. (6*1*35) 



8 fit} 

s ^JPLi " ^ WfclA... (6.1.56) 

IP* Uts2 * * 1*1, ...» R 



sspat&on (6.1.53) fls>3ttowB %m®$&At£l& &rcsaa (6.1*46) • Using (6.1.56) in 
©qeafcta (6.1.49a), 



-169- 

Keting that, If n > 1, 

n»i „ n-1 te-1 / m ) 

£ f 41 (u»!s) ■ £ £ &£ p. 

n»i n-i f„> 
» £ 2 p\, p. 4 
ortlcnfrl * iJ 

»»1 (ra) 

m p.* E (a»i*ia) p 4 i (6*1 .58) 

03, 



equation (6.1.5^) foiled txm (6.1.57). 

It P has dlst&aot eigenvalues, thon, bur (6.1,12a) , 



5 w»l tt WJ J*i, .... 

and. Is this ease, equation (6*1*51) redaess to (6*1*55). Q.E.D* 



6.2 *£* fffltfowftyto Bata^ DUtrlhatlfla* 

In this seet&oa rae co&sld©r th© araltivari&t© beta distribution, %shioh 
is an extension of the beta dlsta&fe&tion to JS dissnaions. There are 
several dilTerent generalisations of the beta dlaMtattosif tide particular 
one is due t© Houlson C30l* "Ks* tscseats of this dlstsitetion have been 
derived fcgr KosiraararCSl], *&© also Folates the raultivariata beta distribution 
to the gaaaa distribution* Some of Hosinann's results are presented 
here for the sake of cofBoloteggosss '<&® proofs, for the sost part, are 
original* 



-170« 
6*2.1 miM^Sg^^m&^mAM^m^mo »» tonta stoehasti® 
srsotos 1 , p" » (?l» •••• eC)p Is ssid to have the o&ltiTOil&te bete dlstaAbation 
ulth payaaatap a if JT has the joint density Amotion 

p K i«i lc*i k • M 

■ 0, SlS€B9hS?0 (6.2.1) 

*&©*© £ a (B«« ...» a ) *&th 

» i > 0* i«i, ,..,ti (6.2.2) 

sad* if P (s) is the gassed {Unotion end 

B 

M* £ m. t (6.2*3) 

l«i * 

the noHasHsing constant is 

B <a) » — PW . (6.2.**) 

TC r 



It is to be noted that f _ (p I a) is the joint distribution of E«4 of t&e 

elements of % the Nth alotasnt being detesa&ned ly the oonstrsint 

£ a « a {6.2.5) 

Ike iblXcft&ag leans pswddea an alternate representation of the 

noraaliaiRg eoastaE&* B (g). 

XflBBft£*2a£ Xf B^(a) is defined ly (6.2.*f)» tfeen 

H N •! 

B H (s) » CaG^t 2 ^> BC®g* 2 s^) ... BGa^t ■ )] 9 (6.2.6) 

where B(s»n) is the beta ftanetien. 



-i?l- 




TT 



DCOOMOMMOH 



f 1 < ^ v r <■*> r< t i 3 v ••• r <vt } r<v 



» CB(m,t S n ) BCsa,., £ a.) ... B(sa * o )J (6.2.?) 



Q.E.D. 



Since B(m»a) > 0* It is o&oo? ttat 






2n ths simt t&aoFca It is ©st&bHstiad that / f: (p\vp&p » i. Xt 
then follows that the B^tS.ir&s'iat© beta d@ns&V tootles, as defined teg 
ocpatton (6.2.1), io & proper density function. 



iaSfcS X* *i <g| g) is defined ly eqaatlon (6«S.1)» 

lahere dp a dp* • • *dp • 

(25 
££&£• She ftooiran is proved fegr izx&etion obS. Par M *2 9 f* ^(j* ( g) 

is the raitvasl^fca beta density teot&on and (6.2.3) holds. Assoae 

(6.2.8) is tern fop N» Thsa» 

/f^ 1) (p|m)d S *B <■> / H pV^- * pJ^fW..^, 



-178- 

L@t us &@feo the tafcegraad tra&sfbsia&f&on 



H-l 



p. » (p«» •••# p«-» i - £ a) 



t3h«P© 



P« 



flha Jaoot&as* Is 



1- E S3l 



dp K-l 



t©i 



v 



Bating that P^^ s gbA &d % g sasd itottiag 

m % * (ay .... ^ ® g + ^) 



S g tt <V *w«>» 



oqaaUon (6.2.9) 



(1- E pJ 13 W *>«"*^f 



(6,2. 



(6.2„JIS) 







1. 



(6.2.12} 



Q.E.D, 



The method by which Theorem 6.2.2 was proved can be generalised tt> 
provide an identity whieh will b© useftel in subsequent proofs. 



lfo»if« ^i ^fl l Let ^ be e rendoa stoehasti© vector with the mslti- 
variete beta distribution* f g (p | a), let yt* and v be positive integers 
et&sh that /*+ v o J? and oak® th© transformation 



where 



Let 



rsr. ..., fr Mf i- e to 



» » . *> - 1 
I f 



*s _ *<** 



xj-1 



2 « Gy ...» «^ lt i- j^ t^) 



(6.2.11a) 
(6.2.S3b) 



1- 2 % 

ted K 



i»i 9 ...» v»l 



1 Mi /^^ 



(6.2.1%) 



*2 /<*1 B 

Th®n the joint distribution of (§t n) is 



>* 1 >, 



.<*>, 



2» 2 1 2i # sp " f f ( S I % } f p v ' ( s I sp* 



(6.2.i**b) 



>.2„ 



*+* . •w 



• since ®<3L* i (i»i» •♦*» v»i)» the range sets of q and ta 



_ ^ and ^. o » the «?aoo*a.an of the iransfbraa&tion 



i.2.13) is 



RlAfl» •••• Pi 



9*n 



j . . «t n 



"vi, 



*5Wt 



9n 



J 



■ (1 - T a ) . (6.2.16) 



taking the transformation (6.2.13) in (6.2.1) , 



v 



~,mT l 



. 2l V %> " *♦»<%> W i %**« * 1 \> 



*•* n „^<ri ^* B «-* 

x n V ^ (1- eu)» 



» f^^Cq | « 4 ) fJ V> ( 8 | g 2 ). (6.2.1?) 



Q.S.D. 



6.2.2 She MamtttMAalft £atft matoriLfrrtftq q fliaff%MV 

If the randata st^ehasii© veeto* ^has the ooltivariat© beta 

a. 

distribution with psffsaeter jjfethea the H5alti¥af4ate beta t&stslb&t&OB 

(N) 
ftmetioft is denotacl F Q (p) ra}* *shes»©j If p ■ (p- t •••• P )• 



f p <pls>-p!:if l6 p 1 v ^j 



/.V^^sl-^ 



(6.2.18) 



As is the case tilth the beta dlstx&botion Aineti9Q« there is n» olosed 
expression fer esp&tioR (6.2.18). We ®m 9 however* ©sspress f|^ '(g| §) 
es an CM)«&>M &n£Sj!&tQ sas. fho ©as® H « 3 is iHtastratodl here. 
For notat&onal eiapllc&tgfc let the parameter vector be 

a® («M%v)« C6.2.S9) 

Then 






F< 3) (p| a) • B 3 (®)J P4 - J%£ % £\t^*jT d^d^. (6.8.90) 



If < p + p > < t 9 «e haw© G-cq^ + Qg^iin tfce range ©£ Ij&egsatlon and 



tli© eeslee eoampgtng unifemly* Sinee kill positive integer* w as®* 

k 
cKpesud (q^ 4- eu) la s finite t&noaial series, 

Is 
<<S i + V £ ( v> ^V (6.2.22) 






Hue* 

3 te>=0 vcO 

• B « ? £ (^)Cv>W) k ^ J—. (6,2.23) 

Let (sV denote H&® feg^peygeaasetsio oeeSElc&enfc 

(as) s ssCz ♦ i) ••• (x ♦ fe -1) kBi 9 ^«»« 

Is 

■ i e fc«0 (6.2.3t) 

ifaepe % Is best real sssdbop. ^®a 

M) k (^6 - , W . k (6.M5) 

(te»v)tvl 

and (6#2.23) becosaos 

a « k (i-v) fe Jp p* M _ 

Hews&ng the ©pdes» of smaaatiea end noting that 

a(a + %) 

a * k » 



«i?6» 

P*VB .,,, * 

*e have 

A ft f-? ?"* 

(35 Pi $ *° °° ^) w >) fo (PL P« P* 

r^Cp s) «* B (m) -2u2 S £ ^ i fe M * 3.. (6.2.2?) 

P-" 3" a$ vpQteX) (o*l) ((Hi) k8 v8 

the <3a«ibX© infinite seiies of (6.2.2?) is P (VftafPtAtlf ^if p«» pg) 8 
Appell*s seseond biype^essasSs'ie ftaetiLon of tod v&y&ables C2]. AppeH 
has sh&tta that the dsa&l© eeriee of (6*2.27) e&nvsrgee abeo2s*te2y Hhasiavep 

P* + P~ < $• '&&&% we have 
* 2 

a 6 

(3) ^i^ 

• * ft <P I 9) s B ^i> -% £ - PJ*-V»«»P»*fri»P+M P«»PJ» P*/fi <* 

p ~ 3 c$ 2 *z „. 1»3 

p^p 2 <i 
(6.2.23) 
The question of oonveygenee of (6«2.2?) on the feoHnet&sy of the region 
p, ♦ p rt £ i has not vet been resolved. We note* houeve?, that etja&tioa 
(6.2.23) poc^as valid t&en p 4 * p^ « i» 

6.2.3 j^ iUKtmteF&m& ^fesrtAta &&& mataitafeto* i«t 

q** (&» •••» «L) fe© a vexsSam veet©3? with rang© set 

R 

vhere a > 0. The veotos 1 cf is said to have the nonstandag&seti mltimslste 
beta dists&hition with po2*aeter (a,^) if gfhas the density Sanation 

qeRja) 

ts o f elsewhere 



R (a) o (q |o^ tL £ a (i«*i, »••* N) t £ « e* a V 9 (6.2.29) 



-177- 

a > (6.2.31a) 

sl>0 i»i» ••., K (6.2.3ib) 

M 
M « £ a. • (6.2.3ie) 

Sh® nonstsndaf&eod cailtlTOzlato beta disis&batioa is ©btstoad ftooa 
equation (6.2.1) *$• making the tspaoaSonaatioa 

^«a& (6.2.32) 



^Ssish has the Jaeofc&an 



* ' * fri »« (6.2.33) 



^V •••• Vij a®"* 

Xt then foUsus that ti^ (q | a, g)^ and 



&«£•*> Ite&Ba3» jp&jaaeaaittflnaaL laMMcm^ awpw»matpi« *m 

MlaaasiGnal eswIq® stodhast&o vo©t©p filth the dsns&t^ £taa@t&ast fi Cg j ©). 
We ae» show that ths mg'gtaX an3 ©ondit&onai d&sts&feit&ons of v of the 
elosunts of 'p (n*4 s 2 9 *..s B»2) $f@, geqpeot&TOayt nslttaaBiAti® beta and 
ogststandavdlasd onlti'vsfflate beta. It is assreaod* ^dt&otai 2osa of 
g$nese3ify v that S&e ©losceite of interest Q&ep> ...» / p^. 



q(v») • CKt «••• P » i • £ %) (6.2.3k) 

• £ v i«i i 

t(v) «* CfC, ...# K» x ~ *<"> - = tO (6.2.35) 
- l * iax i 



b(v) s £ ft. (6.2.36) 

acsvti * 

Alaa* ILafc 

£(«o) e» ($f f ..., n** ). (6.2.5?) 

*^^y^ &s£a& 2f the *aa&s© fN&aensioBaX atoahaetla veote? p* 
has tba mXt&varftat© beta dlats^butlcm v&th pajpaaatap a* than* fbp 
MBit ...» ft-2, tha saspglrsal &L8i«lb2$&o& of (p.» ...» pf ) is anlti'ragiate 

1 V 

beta* 

Cvrt»i) 
PCPjt •••• P v |a>" ^ <2<v)|/iM>» (6.2.33) 

asad th® csn&t&oml da»str&botS.o» of (&» ...» p )» gi^aa tfoafc 

(^ *» •••& K, *) s Cp f » •♦•* ?««)• ^ 8 wnstaadaapdlaad soltivaTiata 

beta* 



«&a?e 



DCP^ •••• P M ,8# 1 <v» ■ *pa (JCv) | **(*),>£(*)), (6.2.59) 

M 
>ii(v) a (n » ...» a - £ a) (6.2.^0) 



« Lsfe 



/£> (v) ® (©.» ...» B » 8L.). (6.2*£&) 



& ° Ct»*» ...ft ^ ft i «• 



«&9P© 



s»^i ioi 



Pl_ 



-JUi-o-xUrn w . > 






d^ let 



^° ( Va* •••• V 



then* bsr Stseorea S*2»% the Joint density fimetlen of <3f(\0* u) is 

D(S<*>» 3 |4i<v)r 82) « *p^ 1} (q(v) ^(v)) f^Cu | fg). (6.2A2) 

the mrginal daaasi^ ftmetlon of *%(») ia t therefore* 

J BCS(v), 2|£<*>. §2> d V' #A Wl * 4^ t)( S (w) \A (v)) * < 6 * 2 ^3> 

If (l^i* •••» %.j^ ffl ^vfrl* •*■• ^B-i^ ®* m *** ****&& variables 
pi 9 • •«» p are constrained \$p 

* V 

£ 2 p * t»b(v) i»t» ...» v (6.2»^) 

1*1 * 

and the oond&tienal density ftmet&on of (K> • ••» 1T ) has the kernel 

JT P.^ <i~b(»)- £ p.) A (6.2.*5) 

i»i * i*i * 

ufcioh Is the kernel of the TOnstandaaK&sed miltivariate beta density 

ftaotlan* 

^r 1} Ct(v) ) tM»U/£i»)h Q.B.D. 
P 

6*2.5 &£!3£fB& fftffljfiVM- '^ J ® raoasnts of the aaltivariate beta 
distribution are aost ©aaiSy derived as a special ease of the ao&eais of 
the aatrix beta dkstrib&tien* to be considered In Section 6*3* 3hese 
reaolts are stated here and proved in the next seetloiu 

let the random stoehasti© veotor pT« <&» • ••» lO have the malfi,* 
varlate beta distribution tdth paraaoter a » (s^, •••» aj. then* if 

R 

M o 2 a. , 
i«l * 



1 * K 



, i«*i 9 ..., R (6.2.47) 

H 2 (K ♦ 1) 



-180- 



^(K ♦ I) 

All b&ghap moments ©an bo eonssited £ton the psoaypen©© relation 



^» pJ s .-,.1; J, ,.,.,, 9 %£**» •••• n i6*2J&) 



b C H p^ | a 1 « 5 (b) b nf 1 J] & | T a (a)3. <6.2.ft9> 
l»i 1 a a i£l * ' 

ssbey© the v* are nennegativ© integer a is any index such that \> tt > 0$ 
end T a (&) 4s the veetor a «&*& *&® ©leaent a^ increased tjy uidtsr. 

2n satslx fora* (6.2.**6) - (6.2.**3) ©an be saaaasised as the aeon 

vector* . 

SC|3 ■ "J* & (6.2.50) 

end the va^anet9~<x>?&f4&£ic© tuatsiiSa 

vQQ» -r-~ Dft*n3« (6.2.51) 

Using (6.2.32), it is easily seen that» if §f has th© nsnstandardised 
aalti*a?iate beta tasity £<sigj«tlea fi§ '(^\ a, s)» then 



e[§3* -§- ® (6.2.52) 

and 

Cm* - A]. $6.2.53) 



*r(K * 1) 



6.3 && li&fcfe M& mflSaSUiftflto* 

The K2S rente generalises! stochastic raatg4x$ £ « [p. J» is said 
to hap© the mtsls beta distribution with paraaeter % » [as* 3 if 2 
has #j© joint density taction 

4™\m - ft ft I y«[ A 1 t «v 

Kp 1*4 j*i te»| If «& ij £ -*k 8 $ 

« 0, elsewhere (6.3.1) 



•181- 

«hs<?© %. Is a K s M taatsis s*a©h thai 

SB 

nf . > O k*l» ...» S. (6.3.2) 

*5 i»3»l» ...» ST 

k k 

and B (sl) Is d©f3Ln©3 t$r (6.2.^) • the genes*!© sera of |J is djsnoted el * 

an gfrdfoaanstonal ^©otop. Th© total ooB&si* of raws of both £ fisid ffcf 

N 
is R a £ K. • lb b® quit© general* ft© adadt th© peeaiMliV ttot K « a ° 



£©S» 8©t39 i. 

She sats-is beta c&sts-ltajt&OB is th© joint distribution of &(M) 
randbB vssla&Les* p* .. Th© paaatn&ng K elaaent© ©f £ as 1 © d©t©3?s&n©& tear 
the relations 

S^.«»i k»i t .. #f k. (6.3.3) 

**i *J i»i 9 .... r 

Xngpeot&on ©f ©gaatioa (6*3*1) shesjs that th© s&ts&s beta densi^r 
function Is th© product of K mlt±v@s*JLat© beta donelV ftsnetlcne* 

Itfoltasj that 4p*^(fl%)^0 aw* 

Jz® 9 ®\g\*&m »1» (6.3.5) 

%*«p©<j£ = ji it nap*. 

i«l M, $*i *J 

She ffrdly ©f matrix beta dists4bat&ons is a fa$&fy indexed fe^ th© 

pawaaeter % z % « *&©re the asteissabl© pamaetsr set Is 
^^ K»!3 

iT K K o ^%feia StKN, E ^ 1 >0 ONU •••! ^8 l t j"lt ...» H)|» 

(6.3.6) 



the positive ortfesnfc of E^. S&ae© f* ((? \%) ie a eontisuotia function 
of % Ibr all 2. e & v «t OeroHasy 2.^.5 itapSlies that th© fossil of matrix 



-tea. 

beta distributions is continuous in frt . In Theor^s 2.2*1 it was shown 
that the matrix beta distribution is the natural conjugate distribution 
for a SSarkov chain which is observed under the eonseeatAve sampling rule. 
It then follows that the ZosAlp of matrix beta distributions is closed 
under consecutive sampling. This property is used in the following theorem 
to derive the moment© of the matrix beta distributees. 

~ Clf J be a random generalised steohasti© 

^ am k 

matrix which has the aatri.i beta <&stribution with parameter JW » Cm. .]• 

Then* *_ 

vay ft£j e \1 . ( S " "* 3 » k*l» ...» S. (6.3.8) 

* 5 (M 4 k r(H .* ♦ i) i,>4 $ ... t % 

4 i 

eovCt^o p£.] » -SL^ — . f ^k»l K (6.3.9a) 



a' * o 



p.&*i» ..., K 



p»o- 

»0» $te or c^y (6.3*9b) 



w k 
f « £ m i . te»i t ...» X (6*3.10) 

1 3»I *J i»i, ...» K 1 

k _ k 

£&&£• Let T (9*0 be the taatrlx W with the eleawmt ® ineres-sod 

by unity, then 

k 
- ®i^ (6.3.U) 



-18> 

B»r 4 £ k or s £ v 9 ^ g end p^ ay© independent random "easiableg and 
^S* ^V a ° 9 ^f J «» k and a » v» **«** 2-3.2 yields 



^> **(<*<&» 



•"-«£. 



■ ?fi .iff. » p ^ S (6.3.12s) 

.2L-— » £ » ft (6.3.12b) 



frost *Mch ©qaat&ona (6.3.8) end (6.3.9a) fbllew. O.B.D. 
% wxdtlng eq^at&on (6.3.3) as 



M i£ + i (6.3.13) 

£ w C|f J < 1 . to*, ...» R (6.3.W 

$Lgil2aply» egtrntlota (6.3*%) ean b® wriitea as 

■««£>• 1&3 - - J - (6.3.15) 

and, sine© py^ £ 1 - p^ R0 we have the bsrands 



^1 * 0*0^.1^3*0 (6.3.16a) 

- vart^l * «>vC^ P^l '* (6.3.16b) 

for kM, ...» K a 3 «tM s i 9 ..„„ *}. 



When £ has the oetilx beta diejtr&batioB the rows of § are mtsaaBy 
independents tSsie # 'She ganeral joint jaoaent is 



n n jf at «>** ■ n if * n rfj^ 

i«i j»l km* *3 J i«3l kei |j*i *3 



(6.3.1?) 



where the vr. are n3n»nagativ@ Integers* Lot 



S 






(6.1.18) 



The follstting theonaa provides a reeorsiv© gfenaula for coapating this 



6 r ^2 If the Ktl pandos Batrix 2 ^3 $&® asatrix beta 
distribution %&th parameter % » than 



E 



[ft <^ 3 >%] - &*> e [qJA 1 j ^ l v* j] 



&*!» ... 9 K (6.3.19) 
i«i *••» K 2, 



k 



sfeor© the sjl - ape manegatlv& integers and a Is assgr index sash that 

£gg@£* 'She theoroQ folSos® isasedHateSy fey applying L@ssaa 2.3. § to 
©qaat&on (6.3.13). Q.E.&, 



Since the naltlvailate beta distribution is a speetal (Bass of the 
matrix beta dlstvib&tien in td&eh S « 1» «e inaadiatal^ have the following 
©oreliary. 

tea^OT: 6 1 ^..^ Let the randan stoehastie veotoi? 'p » (&» ...f tO 
have the snltlvariate beta distribution tdth paranete? $ » (a., «.« e 

n 

then, if Ka ^g a^ t 



SC&3 a p 4 (a) ** -2L • fr*i» •••• » C6.3.20) 

1 * H 

* 2T(K * 1) 



, p]« ™LJ , i& (6.3.28) 

e C n^ %*k | a 1 • 5 a <sa> s [^V 1 jj ^ | y^ 3 (6#3 .s3) 

^hsre the v>. are aoranagative iatages^, a is sagr indesE ssseh tbat v » > 0, 
<33d f _(q) Is tfa® veofoi* s i&th toe elaesessifc a l«©p9QSisd £$■ ras&%. 

Bafby© eo&sideslag the raas'ginal and ooadifctaal ^.stslfestlom of 
srabBatsdoos of J » it is neasssasy te define the noast&i&s'd&aedi saatgla 
b©ta distsibat&aeii iL@t 

&« (4» •••! •/*# tgS ...» ^*N) (6.3.S&) 

be a K-diaeBS&c&al ^aots?^ ssh@?e 

^v0» te»l» *•••*« (60.23) 

* i«i» ...» 8T 

X«t (£ » £p, J b© a 8 x fJ soncta mtsix i&th psage sat 



£1 



j(f | | is K x R, p^,, i. 0» £ ||U » a^» (k»i, ...» Kj i©>S* ...» B)J 

Let %{ o[el JbaaSsH siaia&s of positive elauBts. Ttan <f is ss&d 
if 9 lias the jodtafc tasit^ tae*31oa 



-&86- 






(6.3,2?) 



where gT and ^ as 1 ® ggcaerie psws of $ sad jg » respeetivety» and 

(fJ) »* I fe & 
^6* ^? I %• 9j ^ is ^** Boaatandardiaed s&lttoaslafce beta dists&buttion 

defined tgr ©p&tion (6.2.39). 

We now ©ossicles* the aaspginal end ©enditiBEiaX dists&bat&sns ©f any 
^ E v sabiaaig&x of S. ^ea £ has the aatffiz bete d&sta&feat&on. lb 
siigpliSy the asiat&ojfts asanas that the elssents oS % and % have been 



/V 



relabelled so that 2 « C£ J and 9tf « Oa ] (i«l # ...» Kg £*i, ...» K)» 
and that the sohaaiB&H of £ t&leh is of interest consists of the elements 
K . Ci«lo ..«t 8 5°i» •*•» «)» «here f «\i» •••• K s ^ w i*» •••• ^5 
Define the {s(w*i) toatxis 



fv 



53 



P M ... p 



... 



W „ ... O 

"ft > 



i- 


V 


ij 


i- 




p 



(6.3.28) 



end the ^ st (m + i) aatrix 



T » 

8 />*> 



v 



p ... p 



iv 



i - b. (v) - e ?; 

i 



p*« •*• 



%> 



^v 



i - b f <v) « 



(6*3.29) 



8»i 

b.(v) » £ K« 
1 jp^j ij 



i«i f ...» f (6.3.30) 

CerreapondingSy, tae define the p ss (v + i) paraaeter asat^Loss* 






»i87« 



11 i« jm»l IS 



*fl 



• • • 



"% ^« B flJ 



(6.3«3i) 



S3 £\> 



'ii 



• #• 



*iv K SW 



• « • 



"fi 



• •• 83 _ BE 



(6.3*32) 



^VpKW!* &t%tfo kat the K * 8 ffaraSom ganemliaed atodaastie saatelss $ 

have the natrlx bat© d&stsAbatto «&th pas?EDa©tes» z?! • Taen, fop f »i» •••# K 

assd v»i, .♦.* £M 9 the a^giaal joint dtstclbi&tion of (ff » •••»X » 

it iv 

"pL t •••» $„.») is satilx b©t&» 

P fcutl) 



fV 



and the conditional joint distribution of (P«d» •••• & )» given that 



Pa «» •••» P« ^r <s^» * 8 nsnstandta&sed raatsix beta* 

*»a V & *W W - '&"* )( 2 e J *W ev > 

(6.3« 



& <*,) « (i « b^v), ..., i • b f (*))♦ (6,3.35) 

JSBBfit* Ih© theorsa ItoHs^s ioraediateSsr £*om ThoosiQEi 6»2«&p mpse n&ting 
that the cja^p&22 beta density flanet&cn is the product of K oaltimg&ato 
beta density taettass* 



<=>£.o&» 



£•* M&&& flateaaA QavlBflaki Jtofa^tom*. 

Jf 2 &as $k® matsix beta d&otslb&tfcon Hi© iwa of Jl as 1 © mfcoalS^ 
independent j^jdoss ^aatojps, 1h© deeision-asfees' say* honaigert t&sh to taso 
a P&ojp e&etarlfett^on ^fc&ob a&s&ta n©n*sasFe eosrolatio® betwesss tke sobs of 
2 • Sacfo a diststeat&aaa oagr b© oonstesctod tAth the aid of equation 
(2*3*3), but at the aspenee of os&paioating the foHmala© for the swaenta* 
Wo iU&stsete this oanstroetloti Iof i2i2 random aatsrls* 



« y i-p 

q k»Q 

A theorem relating to the general KsR ess© is first gi^e^. 



(6.4.1) 



6.*» r l Lot h(%ty 9 *>) be the probability density ftsnet&on 
defined !#• ©qaafcLon <2*3*3) and let ^g bo the odrrosponding ©amended 
natarel conjugate Ssz&ly of disislbat&ens. Lot C(^ ^) be the normalising 
constant doflR@d ly ©qaat&on (2*3*2) and lot 1? . <#f) be the g&te&s 
ta&th the atasnt b? increased bgr wj&ty* Then the boss*©* "garianoeQ* ami 
©osarisnoea of h(f Iftf $*> ) ar© given ly 






krtU •••• K- (6,4.2) 



ij - 



if^tfcafiP&s ♦ ••* Is 
£BB0& Us&ng sxjtaatioao ( 2*3*3) and (2*3.2) 

u m 7 V # 



4 



*.« 



ca»l P^S, ya| 



C(^,ft> 




(6*4*4) 



tf&eh is equation <6.& &). Tteoraa 2.3.1 ahoas that #g is 
ooBgeeative ssrapaing* Hegj©©* Liaesroa 2.3.2 is applicable aad 






foias tthloh (6.4.3) fblLeea. Q.E.D. 



p£/f."> ^(^(fe^^t (6.M) 



Let P be gl^sa ly (6*4.1) awd let 



M 



3, 



®3 \ 



.2 



(6.4.6) 



(6.4.7) 



^ho?ea.> (i»t, ... t 4). &®t 

Thm (2.3.3) beoesG3©9 

hCfJ H) « C(M)(p-q) § p"*" 1 q"** ( WF* (*-*)*>". Ps / (6.4,8) 

E&slsating the nonaaSising osnstaist tfr aeaas of (2.3.2), <m £is3. ta@fe 
C(H) C Bfe^agMEu,^)^^ "*< 

(6*4.9) 

Efcpafcta (6.4.2) tfeao yio3ds the isean8 9 

BC^3 e ^)B(iaj 9 ^>»2BCi9 & ^2 t !a 2 )B<iiU*fi^) +B(t^+l»& 2 }B(ag*2»B^} 
pCfi) " 



e» 



1 .iwiwijii ■ ■ ■ urn wiim » «i w n iummuii.MMiJii 



B(a & +2«Bg)B(n« f s^)«8B(s^i»i!^B(fiu<«'i 9 e^) +B(ByG!g)S(ra^**%) 

(6.4.10) 

(6.4.U) 



^roia (6.4.3) to obtain the ao^gp&one® 



w [ft %0 * 

BCta^^BCiag+iftEfy)-^!^^ 

Bfe^cS^BC^ftE^-SSCffl^ft^Sft^^ 

(6.^.12) 

sad it is e«aa that thap© is gsGe^sos© ©werolatisa b©ta®®a t&© mm of tC 

n 

*-®fc J£( p ) s ( ^.(P)* •••» TCC* 5 )) b® th® staa^r stafc© ps^bafe&Sit^ vector 

a 1 e gf es 

oo*r«3posx&ng to the KssH stoehaa&o saatsdss P and let « » (v 4ft •••, \» } 

a * N 

bo a veotop of swmegati^© integer, to aetenfeg aatoml eoajsgai© 
distribution &s» ¥ tftddh is reg^tped tes> tbo analysis to fc® easv&ei out 
la G&apie? ? is fbraed fcgr luting 

4 • n (TT^A £./• 



3(M) 



CS 



i«=»a 



a 0. ©thanais© (6^.13) 

Let g s Ce\-1 be en f) st N raata&x of positive ©taants asd lot gu 
donate tfaa ith *c*? of M (i«»i» «.* B !?>• Thos tba HsS pan&a stochastic 

a 

mtds g is said to has® the aatg-is bota»i diatffibatlon slth papam&tsr 
(&*#) is F has fba Saint psobafe&Ht^ dsos&t^ teot&oa 

» 0» ©thswsis© (6.&« 

W(^ft\>) is tho seoipapooal of S£ jr, C ^1 (P» *] t&en P*has th© mta&s bate 
distribution uitfa pjaFsaotsp M p 

tf(H»v) eon bo oongntad rasing th© methods of Ssotion &•§» bat this roqp&Fes 



-191- 
langthy calculations. 

Using Loasa ft.2.1 9 it is easily seen that 



J*£! C? I **>** " ** (6.4.116) 



Jai 



The first two assmanto of the distribution aro obtained fyo® eq&atiene 
<6.fc.2) aw! (6.ft.9), 



n _n . W(K 9 v) 

8 CK^ . | M # v] « _SLZi * * . • C6.ft.IB) 

aPvflU MM W(t J(T &!^ 9 vli 

N 

tfhere Bf 4 » S M. ^ and T (K) is the aats&s K vith its (l 9 J)th elasent 

increased bj unity. 

Doe to the esapXes calculations repaired to obtain the normalising 
constant H(g 9 u) 9 the ©ata&x bsta»£ distribution is presently of limited 
usefulness. This distribution is* hov&9er 9 of some iaportanse sine© it 
Is the natural conjugate distribution for one of the data^gsneratiag 
processes to be considered in Chapter 7* 



£•5 ^Ms&M&Uto,. 

The bet&4MttX© distribution is defined to be the unconditional 

distribution of the transition count F of a Markov chain tdth transition 

m 

probability matrix P^tshieh is drassn free a raatris beta distribution, the 
betai4Jhittle»S distribution is defined in an enalogoKis fashion* Zn this 
section explicit probability aass ftmct&ons are derived Scr these 
distributions and t&alr nerasnte are discussed. 



•i93- 
6»5*i 'ffift fllWfMttftlf ^^Mfrtfto- ®& A306d a and ▼ (u»«Bi» ... K) 
sad fixed a (n«l»2 t 3 M «.) 9 1st 

« V » I 15 £al jol ij *• •* iu V9 

(iai, ... p H)J (6.5.1) 
a?ad 1st 



^ «(«»») * ^.(titVtn). isalf ...» K (6.5.2) 



r^-CttfO) Is ths sat of all possi&U trsasitioa eoontse ?# tA&ob can ©sisa 
fSosa a sajsp&© of & ®3fts®®s&v© t&tsiasit&Gfts ia a Hasftov chain i&thin&ti&l 
state u QEsd a positSL^e ts«nsS,t&9Q piofcafe&Sitar aatrta. 

3feo betap&hlt&la ssp&bafe&lltg? Bass 2toe$ta sdth papeaot©? (n»a»H) is 
definad as 

« 0« elssKbay© (6.5*3) 



whara upI» •••« t9» EPi»£»3,..# B and M » D\J Is an H a J 
that a, . > (t»$3&» ...» B). 

When F has tbs bataptfM.fctX© dlstslbatiofi ^Ath pa^ssaetes* (b 9 k»M) it is 
©leap that gisast baro the pawga sat 4" H (tt»a)» siRsa the set of etaobastto 
isaisiee© sMoh have <ma ©s» aopa altsaanis ©goal to sara is a s©& of jasasapa 
aaaro relative to the saatei* fcata <&sfe&bution. 

It is saero fros (6.5*3) that 4^(jg 1 a^g) ^- 0. ^ oospasirag (6.5*0 
and (6.1«5)» it is o&m that <£ M (%rit) *» ^ M (u»&tP)» provided P is a 
poaitlv© matato. Sine® tha eat of as&pssii&TO isatffioes P is a set of 
aa&sure aero and ato© ^w^M*) ^ a toit© sat» we haw 



>*9> 



1 ig»(ji« ig ). J 1 ^k^^wm 

» U (6.5-^) 

TEms» the bota^'rMttXd eaos teetta &o a prop®* prejbab&l&tg?' mse ftnat&oa. 



'%wqR <?«?^ 3fte betee&t&ttls mass fbnoticn s&th parameter Oa^M) 
las given l$r 



n 



(id , i«i is *• *• 






« 0, 



(6.5*5) 



N 



£ 



B(x 9 y) is the beta tomtitem* w& ▼ is $h© m&ep© 



solntUcsi of th© esjoa^osas 



*i. " f .t * °iu " V tKi 8 



&£££• Letting gsucSsmt© the i& sot? of H» 



8 









11 ^ *«« * 

1»S ^©1 *3 



iedl 



"OH 



i«i M **$ 



dP- 



(6.5*6) 



Vm iategraad ic tbe \sososQl et a tuafels bata deoaity ftsnetion t&th parasites* 



M ♦ p f horso©* using (6.2.U)» 



pw ° ' ° via j, j 



A. n*.> r*J 



i. *v* 



R H fp (f * a , } 

n n ' u ** 

J l ■* » <IIIIIW ■ '»! jfflllWIfflWBIII »W I III 



ft 

* &=4. i i« *• 

b p ^— ■ ■ ■■■■■ . (6.5.7) 

It Jl ^.BC* 44 » %4> 

Sb® BKneofca of th® b©tap*Mt^l® dls&F&bnt&on ar© seaasfeat e&apa&oatad 
to ocsssxits. Rofas^lrjg to eqpi&t&ona (6«l«Sft) and (6.i#2?)» if F ® [£ J 

® SrJ 

has &© b©ta»Wfrtt£l@ distribution «lth paraaatar (u^M)* thea 



fcnO 

es rs 



and* -siQlXastiytt 



,^* *^ 



*^# V • 



oy p8 •* ap j ^.j |> ts tja ''op ^ p py p v6 p oy "V^ p 6a p oP J 



o»PfY»&=i» ...« 8 (6.5.9a) 



o« •• 



Ib both of theao aquations %£•] 3©notas th® es^oaot&t&on operator ralatiw 



to th® diatribofcion fJ^JP | M). Ihose ea^seotatloBQ 01a b© s^a3aatsd fcgr 
repeated eppaioatien of torn £*3«2 1& a sisrasar tMofc sfesa&S*, ty not** ba 
feai2&ar» tat ths e&£sala1&©na a joart&oftapSy in (6«5*9&K teed to beeaaa 
egtenaiva* Agp^ostotteja of th© aori wa have ^.soassod in Chapter & oan 
also b® sad®* Stop aaaS! TaSacs of the parasaataa? n» diraet osloaiation of 
tha aoosHts is probably tha soa& oKwaeaient vasr to appreaah th© pret&aa* 



-195* 
6*5*2 3» Mft#fttUfr?ft P^ffWffit^. Ihe set of s» possible 
transition oomta f t&ieh osn aslss tea a aaap&e of a eonseoatlTO 
transitions in a Hhsfaftr chain ts&th orl&isssy initial state and a posits 
transition probability aatffix is 

^ *<»> * U 4\b&m0« wiA... (6,5.10) 

» $»i H 

lbs HsK xw&m nsatE&x F t&tb rang© set <p „(n) is said to toe the standard 
beta-Whiti3^»2 distribution «ith pavanete? (p 9 atH) if F has the pssjfcaUUtsr 

asass tmotkm 

feU^iP " v/*wf<I l£»°»£> C^l »♦ < 6 '5-*« 

tsto© p a (p«, •«„ p } is a stoehaatl® ^eotor flt&oh is ftenof&omHy 

independent of P* ?f 3 &»@*3».«*# ami H w Ce^. & 1 is an R s ?l astriK %&$h 

e^j > (!»>!••.•• E). It is readily established that f^ (| lg,n»g) h 

and that __, 

A* f£ H 2 (F |Ptn»M) » i. 
p^ ' 

Let 



Fe*%) (W •■• • 



**Hi (n) * (lll € ^V a) » f &. * *.i <*■*• •••• *>V < 6 *3.») 

and 

4>^Cn) « 4>* a (a) - ^* m «. C6.5.i3) 

Sinoa 

it fbUcss from £&s& 6.1,5 ®Rd the fast that p is ftsne&ongCO^ independent 

of P i&at 

a 



•i96- 



ffi<si8*#*jL ^4?<£i^^ 



%$> ** i *— a* ^ •"« -pw 



SI * 4«& &0 *• * # ,„ 

■ { «*TT —** m 

Vw H ft W 

2e (6.5.ife) f If ge 4> m (&h fa»*) is $b© taaigp© go3a&km to tfeo eqaatioas 

&s inportsni om® 2a tMah p is not ftaastioeaMar fjs&psa&tifc of P 
©oatups s£s©n p © JL®* the sfcoa^-stat© pae&as&Si^gr ^ra©t©a? ooss'espjsate 
to P* In this instsao® to d&gto ^5© ssssstaBrispd b0to4&&ttiU>»£ dietvilnt&QQ 
taith pasasster (sifg) in fe&os of the 3bBos&Bg psobaJbUitP' taasa ^notion* 

4fL < F I «•**> • f ^5^1 «£•»**> l^tt I B)aP # (6.5.15) 

x3h8flF@ e*1,2»3 9 *»* &3& K © 0\*] is aa S as ft mtsiE s&sfo thai si > 
& 9 jb1 9 ... 9 H). fte vwfa$a? X 4» the istegmid of (6.5, 



stea^ state veetor ecHraospufa^tsjg to V m& is w&<$%e3& &&§m& £os> oil P 
enoept s 99% of season?© s&se* It Is disss* that th@ sang® set of g is 
<£ -(n) a«d that f SL,Cg i sa^g) *« a psopw jwobefciliV raass gtarfc&oia* 



-%9fr 

is the selected TTaXa© of w (P), tlsasa the aoastasndosd bet&4ffii$tile»2 

probability taass ft&iet&on tdth paperaete? (n»M) is givtm tgr 

m 

K 

n f« s(f , ■ > 

W2 1 "•!? ■ ( * 'Vl * ^ V TT — * & € f M 

a 

H f B(f. , a. ) 

- '.4*B>%.-r-r- * = "^ 



In (6»5*i7)» if Fc 4* «»(»)» (u»v) io the uo&qae solution to fcfe® ©gaaUons 



*£, (I I HP - jj S*W£\ I M^^CFl *>». (6.S.I8) 

The tenal of the iategsaad of <6*5*i8) is 

-I* & J. >*'* *■*"'■ 

Haas, proooedizig es In the ppoof of tlwoyQE 6«5*1» 



t^Lil I n»K) « 2 f*!?<F \ i # n»M) f TqWfff^P | P * H)di?>. (6.5.19) 

Equation (6.5*17) fbHofws tea (6. 5*19) sad Letaaa 6.1.5. Q«E*D« 

$ho esoaerats of tho stsiated befesHtfhlfctlo-S distribution oan b© obtained 
£?oa tb© aoaeota of tb*> bofca-'WhittX© distribution ty tasing tho station 

the msasrtta of tho nonstandard bstfipHhitfcle-S distribution ar© gi^ea fcy ta© 
fbUooing thoosnaa. 

Tfoftama 6»<^ I®t EC* 3 denote ths «55>©etation ©pemtor relative 
to the nonstandard bstarifJEdttl»»S Retribution tdth pereaster (%M) snd 
E_f •] demt© the espsotat&on ©psrator reX&tiir© to the matris beta 

s 

distribution vith parsaeter- M. Ohm 

CKj » "SjJC^^^t i f >tt •♦•» » (6.5*21) 

end 



a 9 PfV9^i9 •••* H (6«5*2&i] 



*••• 



&£&£• 2be thoorea Altera £jsaedi&te3y froa eqa&Uons (6.1*53) and 
(6» &•$*}» together tAth the r@Sai&on 



bcSq 



where g(F) is as^ testioa of F for iMeh tha sspeotation esdsts m$L E„. J>] 

(3 IfS 

is tha expectation operator relative to the whittled distribution xdth 
paraaeter Qr»n,P). Q«E»D. 



CHAPTER ? 
nm> SAHPLS SIZE AK&IX3S3 

>5 *5® oaaalnQd soa© sequential sasg&ing pro-bless in a 
ZSattaw elsain t&th alternatives. We nem cosssS-des* the psAor-peateKier and 
preposterior analysis of a tefcev chain garonwd fey a fixed* bat onkaeftaap 
HsK matrix of transition probabllitie8» P f Area rafeioh a fixed rasabsr of 

ss 

©aossGotiw) observations is drawu Xh Sootloa ?*% this enaVs&s is 
carried out unto tbo assumption that the initial state is kaolin to the 
deeision-iaaker before the saople is observed* In Section 7*£ it is assased 
that the initial state is tan&nmsn and has a distribution «Meh is ftactional3&» 
independent of P; in the final section it is assaaed that the chain is 
operating in the steady state and that the initial state is ts&no%su 

7.1 £&&& .S&IS £2S8©a. 

An B»state fteskov ohsisi ©an be considered to be a process tMeh 

generates the segasoee of mn&m variables, %.* a*» *•»» at* •••• «herc 
~ * i 

1L € |i e • ••» K J is the state of the syste® ieessdiateXy after the ith 

transition (iai 9 s t «,«) and el is the initial state obsessed before the 

first transition* fills initial state 9 s£ a u§ is sabjesfc to the 

distribution n « (p«» •••» p )» «here p is a steohas$&© vector and 

P »PC» « i] <ic«4 t •••» »)• She transitions of the chain are governed by 

the H a N stochastic raatris P « Cp> J» *£sere p. » P[s ejls • i] 

ss i$ ij m-x ' n 

(i»^i B ###t R; RaO $ i 9 g 9 .«.)* It is assuaod in this section that the 
Initial state Is knosn to the dsc&sloo-naks?* thus* in this oase» 



-800* 

p e« . i«t» ...» n (7.1.1) 

i iu 

of a eonsecativ© transitions in a Haricot ehsin*, t&ere a :» « Is aeswaed 
kne&n to the decisi©n«saaker before the sample is obtained. ^tsis* a is 
obtained under the consecutive saving rule. Let F » (X J fee the 

transition count of s ♦ then the oonditieaal probafc&litP'* gU@n p'ss p, 

of observing the sassp&e x is 

•n 

*W*P* *_••♦?» ® ^ IlR^. (7.1.2) 

*Q*l *i*2 %»i% i«i >i *3 

If the stopping prosass is mninfbsmtiTO, then (7.1.2) is the kernel of 

the likelihood of the saaj&e* It is ©leer that the statistic F oozraqps 
all the information of the saiaple and that, if stopping is nontnfosoative^ 

F is a sufficient statistic. 

e 

When the transition probability satris is regarded as a rande«a mtg&ir 

P t the natural conjugate of (7*1*2) is the aats&s beta d&stslfcat&on defined 

■ 

by equation (6*3.1) t&th K- » 1 (i»i» ...» B)» 

fl! ,K) (p|M) n n *\S* <7.i*3) 



/v* 



If F has the raairix beta dletritaition xdth paraaeisr 6S« « Cstfal end if 
a sample from the psoases yields a suffJ,ei<snt statist© F 9 fhsorsea 2*2*1 

fix 

she@s that the posterior distribution ©£ P is matrix beta with paraaeter 



M» » H« * F. (7.1*^) 

CS S3 O 



7*1*2 a^aA^fllaMteltoe && ^m^fit^^s^^saia* it is assumed 

that XL e u Is laaa^n asxi that n* the number of transitions to be observed* 
is detersined before the sasg&e is obtained* Frier to saag&&&§ 9 the 



transition mw& Flea renda© Esats&x and the eonditioaal probability* 

given y® ? 9 thai the Masks? obain wiU generate ft speel&G saople x 
B & *n 

t&ioh baa the transition osont F 3.8 given tjy (7«i*£)« VJhittle £&i] has shown 

cs 

that the noaber of samnlea of also n with ^ o u tMeh have the transition 
ootint F is given \® 

irfl 5»1 ** 

N 
whore f a £ f, - (iplf •*•» N)» v is the final state of the saapie* and 

• 1# £** lJ * 

F a la the (v,u)th oofaetor of the satrix F defined fcgr egaat&on (6,1*3) • 
vu » 

lSms» the conditional pgobaMlity of F ie given by the Whittle probabJJ&tgr 

s 

aass ftmetion dsfined ty (6,1.7) f 

PCF\u f n t P3^ £* N) (F l%n,g). (7.i.6) 

If a sample of n eenseoatlve transitions is obtained £&m a Markov 

chain v&th knotsi initial state u and if the ts^nsition matrix p" has the 

a 

satrijs beta distribution with pas^Esst^ M»» then the nnoenditional 

S3 

distribution of the transition count f is 



0(F I u,n*K«) ■ I f (H) (F lu^P)^® * S) (P I WW - (7.S.7) 

-A 

It is seen from equation (6*5«3) that the TOoondlt&enel distribution of 



F is the beta^Jhittleoass function given %& (6»5.5)* 

D(F \ nrig) ■ f^>(F ( a,n^*) • <7.1.8) 

If the prior distribution of ? is mtris beta with parameter H 9 and 

if a soap!© of sis© n fields a sufficient statistic F 9 then egaationo (?AJ 

9 



•20d» 

sod (6*3.?) she** tiboi ths aoaa of ths postogiop d&stffifesUon is 

|» ■ Cp* 3 <7.i,9> 

« if fig a £ cW 



a* ♦ f 
P° • *J *i „ i»3»i9 ...» ® (?•&•&» 

Bfi^bye observing t&e saB^ls* F isa rente E&ts&x «hl«b oa» ta&o one of 
a finite set of ijjslIsos in the ipsqbo sot R(u»&tM*)« Lot 

S(?) * $ p I fc ^ (!!,»), ffop^ PcRCtJtatK*) (7.t.li) 

bo tha sot of p©3Sit&© tpsE&sitiotJ o&mts tstaifih eosolt in a p©st©s?&os* sosa 
*zlth tfao vglsie PeHC^nyg 9 )* 3&en» ty (7«i«3) 9 t&o disttitat&on of th© 
posterior aesffj is gl^sn bgr tho folta&mg p&ob^fc&li V mss fanol£oa» 



o • olseshoro (7»i«&£) 

y© txfa osamo tb&t the initial staid* at » % is istaean fcs the 

<3ec£gfcgMaak9p bofb* 1 © the sasgfta is o"bs<s^®d* best that's bos a psobsttUiV 

c3ist2lbitiGQj p » (p.* •«•* pt )» ?£&& is StooS&onaBy ia&sptsndent of | 

and ^ahicSi ag§r ©5? aagr not bo Igaosan to tho daoiaiGKHaflfcets If p'is iBtesssn » 

it is also assumed that the utiH^ of any toss&nal dooislon sad© aftap 

as is obsorml dopands only on P aaa* not on % 
•a ° «* 



-s>> 



sang&e of n osnsesati'pe tsms&U&m in a Hasten? ©hain* Let u » st be the 



initial state observed and let F s [£, J b© the trsasit&on e&ant of the 
sassp&e. then the eon&ttoaafil probabili^t gl^on ?* P m& p" 13 p» of 

observing the saE5&e s Is 

^Wi ^.ft * &*i$«i *3 ■ 

Xf the stocking pwseess in iioz&ng&2?aat&7®» then* since tssE&xsal utiJIUee 
depend only on P sr$ not on p 9 the teams! of the HkeHhood of the o&agfte 
is 

and F is a jaoKgLasS&p saf&elent statist!©. 

When the oatpte of tsensitien pi?©bafca3iti©3 is ideated as a %&a3em 
satsds P» the mtee& conjugate of (?»2.2) is the sBts4s bete distxibatlon 

es 

defined tgr (6.3*1) talth K »l (iPt» •••* $)• If P has the eats&ss beta 

i e 

d&sts&bation taith pos&sete* K* © £a J and if a sse^S.© fsm the psooess 
fields a E©j?8iml3^ so££talent statist!© F„ the© the postes&e? distE&batlen 



a 



of P Is satsds beta sslth paswaeter 



« 



IK Ct flt 



9.2.2 g^^aSte IMMfrffiM &s£ JfrHfrflftisftaB j^W-ff- xt is 

assssed that n» the number of ts&ssit&om to bo obsw^odt is detsw&ned 
be$b3?e the ssoj&s is obtained* Ffcta> to ess^Sing* the psis> (%F) is a 
srendsm qennt&fy and the ss^ai^oaai psobafc&aitgrt giran ^» P end j&'s p* 
that the l&stor chain taiSl gemre&e a specific sswp&e a ?dth the 



statistic (%g) is &vm fcgr (7»2«i}« 1h© sisabos* of Q&ap&se of sise *a 
with ir&tiel state u «hUh ha»© the transition wont P is given fcy (7*&*5}« 

S3 

Therefore^ the oon&t&onal pobabUitgr of (&»$) is $xm fcy the iMt^»I 
probability bbss ftoottflR defined fcgr eqoat&on (6#i.37)» 

She aesx&tioaal distsib&tics of the ms^LrjaM^ su£»oienfc staUst&s Fid 
the W!&tl&&»2 ppobabtSi^ 1 ©ass testioa ejtoan tor equation (6.1*^3) t 

If a sample of a easiseaati?© tem^Mam is obtained fe&ss a Kasfeov 
Qhain tshepe the distribution of the initial state is te&m to be g anS 
©heg»e the transition pg©baibS.litsr mtpia gh&® the satste beta distribution 
?atth paparaates* H% tfoa» 9 provided p is fdnctio^aBy independent of & the 
tsiisefjditiomi distsibatioa of the temaitto -sosaat F is 

aos s fcy equation (6«5»U)t the taneanditional diets&tatian of Tie the 
bet»"tfhit&e»& probability bjsss f&aetion $.*£« ty (6»5#i&}> 

IKF I »*£•) - f^<g ( p,„^). (7.8.7) 

If J£ is utioxngi ar^a lias #1® pAos» dists&bat&Qn ftaot&aa H(p| 1 f ) 9 with 
©sen p(^)» than equations (7»£.7) and (6*0*ft) ©to? that the 
tssaaaditienal distribution of F^is also beta»VMtti©»2 # 



f S? <gl|(t).^0. (7.8.8) 



Xf P has the oataix beta distsAbatton sith ps&off parameter K» and if 

a seop&G ^pioSdg the Barainelfy aaf&eltint statisUa P» the aaaa of the 

a 

posterior dietsibattai of ? is given V egaat&ons (7.1.9) and (?.£.S£)). 



Psiep to obsessing the ©aaple the posterior 89©n» Pff is & sends© aatriK 



tsith the Unite range set R (tifH 9 )* let 



S*(P) ■ [f I g€ ^(a)» J" ■ ?} |eR*(%g») <?.2,9) 



be tbe set of possible tramsttiQn eennfcs T*&<sh result in "She posteei©? isaaa 

P f ■ Pen (n«K f ). Thm § ftam (7#2.7>» w find that* if p is kncfcn» the 

em ss 9 * 

distsibatioa of the postegies* sjean is $.v&\ bgr the ftdlswlng psobabilitar 
asuss ffonotiGn* 

PC|f ^Plp^H**"^*, ^S<Jl PMg>) P8R*(n,H») 

° ° ' • " FeS (P) J** 2 s ' «• ■ • * 

si e 

■ 0. elsstihsre (7*8**0) 

3i2&las% & if p is mtemm and has the pcios* diatgibatlan function ft(p{+ ) 9 
the &st2&b&t&en ©f the posters? asan is 

a a I sat FeS (P) •*** * 

a 0. ©IseKtes <?«S*ii) 



7*3 SyetaB QprnaBiftag in the Steads. State*. 

When the rSasfeo^ chain Is operating in the stea^r state and the ir&tiai 
state* Ufr is tsztejisn* the aietsibation ef u is JZJf) » ("^(S* •••> "^®)» 

the ete&djsr-0tc£e probability veoto associated taMfe the tasansitto satpiss 
K m tS&s ease* obssrimtta of tx ggercidss iafbacaat&on shoot P* 

o ■ 



7*3.i .I^ShM^te^ totto^at, Ms © (x t *••» « ) bo a 

"rj q n 

of a eonseotftiTO tmma&tiQna in a Mazfeev ehain tMeh is opiating in the 



stoafy state. If u * aL is th» ix&tial state ess* g » [f^J 4a the taaos&tftjon 
csayat of tha sasap&o* the aon&t&onE& probability gtaan fSasfc Jf® |» of 
dbBGQpsiqB th© asrapli© |l is 

T> nfft ^i^i *^ i«i >i *J (7.3**) 

Wbea stopp&ag is s&a&fitfbs's&fto* expatiate?. 3.1) is the kaso^l of tho 
Sifce&iboad of tbo saop&e sad tha osdesped pais<ti»?) is a sa2A<&€g& atetial&e* 

SB 

fthsn ?» the soit&x of tesasitioB ps*3ba£&2i&.©Sfr is rsgeg<cl*2 as a 
jeanta matrix the aafet*£L ©sajuga&s of sqaatitem (7»3*i) is the sate&a: bata*! 
<&ati£tatlon dafiasd fcy (&A«lft) 9 f£5 (P | g»v)« It is ®asi3y s$ea tfaotp 
if ? has the aataix bsta-i d&stx&batlon «ith papmste (H°# v f ) and if & 
asepsis £*ob tho psoosss j&alds a sa25ici<9£jfc statistic; fcs 8 g) 9 the posteftlo? 
distribution of £ is smia&e bat&«i tilth pasaaater (g° 9 M f 0* «h©p®, if 
©> is m !M2tanstaje& s^ veotos* t&th tsth oosgJonaEtfe eqaaX to ob© ao& all 
©thos? eoraponstffts sgsaSL to &ss® 9 

^cH» + F (7*3«2&) 

£3 C3 t3 

m« » v* * a • (7«3»2b) 



4s ms noted in Saotios* 6 A> tha siaEEaH^sg ©onstet sad the BaoEsaaits 

of th© mststo bet&4 <Ust£&bit$i€«5 ops <3&££taXt to o3s$ttte 9 tfeis «Ef^,oa3% 
oce^2imte3 tha taste of assigning & speta§i2.o mte&s b©te»S pete* 
c&sts&batiott to K stco© t£*© s&ts&a beta dUts?itHtioa is sis© a mste&s 

13 

b©ta*& 3iSt8ifeati©S3 «&th tt*» pSFQSSfa? V » (0» «** 9 0) © 9 

it my bo oonva&©E& lbs? ths dec&stawaakas* to txs© a mts&ss beta psto* 
dlsts&bation for Pm£ %» ©Ml assess© tbis to bo tbs oasa is dtocnaetag 



t&© proposteaio? assagais of a Jfcrisev chain GpemtiEg is the eisoaty state* 

that a* th© stasfcer of transitions t& fee ©be©2V®S# is fiasa is advance of 
saspXirg and that th© p*io£> dtafe&button of P is Bats&a beta. Ps&ar to 
eea|3ing t the ©©jx&tien&t probability given p'a i>» of ©bfca&a&Eg a 



ra 



specific sSEg&e a «S,th th© et&t&sti© Ca^F) is $lvs© bgr (?»3»i)« 1h» 
aasfcai* of oaas&es of siae n t&th is&ti&X ©tat© a ta&oh have th© tsonsition 
©sant F is &vm fcgr (7«i«5) as&s theref©?©* tJa© ©oas&tional pse©bski2iV 
of the atat&atio (£»?) id glvsn ty th© *Mttl©»i pg^bdblH. V ^aes Jtanotlon 
as defined *#> (6»£«^?) f 

She oarglnal conditional dlatsibution of n is «23® saa th* s&ggtaal 

©sa&tionol aiatgibat&ea of Fi3 th© t-Mt^»2 d&sts&baUon s 

V&on a ©ge^L© 3^ Is obtained Sfcom a Siagfeov chads epeea&ns is th© 
staa& state t$K*o the initial state is ttfo&an and the taansitaon 
p&ebabUitsr smtsis: p' has the aatris beta distseifeation taLth pasaaater K* 9 
the t3so©nc!l^onal ^LstcibiUos of th© transition ©aaat ITis 

«J !««#>• J«£ } Cl|x(l).n.|)^ 8) <P|H«)<f. (7.3.5) 

thereto©* us&ns ©Ratios (6»5»&5)» ^© naossx&tional distribution of IF 
is nanstarriard b®ts^!sitta^*S as given by ©qaatlen (6«5»i7}« 

D<F\n f M*)» «^<£l«ii f ) ».*« 

It is then ©asU^ s©sn that, if th© set S (?) is <2a&s©d bsr eajaatten 
(?«S«9)« th© pari©? distaabation of th© y®&b®&®$ nsoan is given V the 



fblioidng probability saasa toetion* 



■ 0« elsowhsr® (7»3*7) 



GH&PTSR 8 
SPECIFIC RESULTS TOR A 
WO-STATE MARKOV CHAIK 

Kangr of the n&tters considered in previous chapters are specialised 
to the case of a tuo-stata Markov chain la this chapter, fh® 2 a 2 
transition probability mtpix P is assumed to have the matrix beta 
distribution and explicit foraalas ape found fop the ©eans and product 
raosents of the a-siep transition probabilities* the steady-state 
probabilities, the process gain, and the expected total discounted rewards. 
the chapter cenoladas tdth a result concerning the selection of en 
optimal terminal poHay for a two-state process ^Ath a special type of 
reward structure. Host of the formulas derived hero are datable infinite 
series s It appears doubtfol that stss&lax expressions can be obtained 
for chains with store than t*» states* 



6.1 



Let 



>^s»y^l (8.1.1) 



i«x x 

y i-y 

be ill© transition probability oatrix for a two-state Markov chain, the 
eigenvalues of P are the roots of the equation 



%X - p| * \ - (8»»»y)k ♦ (i~x-y) a o 

SI »' 



and are found to be 



\ o l«98»yi 



0^x»y*i 
OiXty^i 



C8.H.3a) 
(8.1.3b) 



-210- 
Th© eigsswaiaes of P are d&stinet psotfldsd x and y are rat both eqael to 
aero, linen X- f* ^ Sylvester*© Theoren leads to the speetral doosuspositlon 



Pa 



-x% ^r 



y 



s 



•J- (1-x-y) 



-y 
sty 



x$y 

y 
x+y 



(8.1. *0 

x ^ C op y ^ 



Equation (8.1.£») laraodiateV gi^as th© foliating expressions for the steady* 
state v©etor f 



J£(P) « |~1_ , — Z^IJ . (8.1.5) 



L.xty X4y . 



end the n-step probability aat«43s» 



^OorrfO 



p^ a 



P, 



(/) 

CO 
21 



P, 



(At) 

12 

22 



s*y awy 



ssfry »y 



♦ (l~*-y)' 



3S 

'j&y 1 
-y 

sty 



•x 

my 

y 



In particular* 



p. 



12 



Cl-O^y)'] « * t . (*■*,) 






1 - (1-a-y)* 



/Ami 

» x s 

kPO 



k 



apOopj^O 



(8.1.7) 



M 



/*•! 



k 



«° y £ (i~3»«y) • 



the pposess ha^e the reward ss&tff&s 

b 



r * Cr 3 - 
* 15 



e d 



/<rai d 2 t 3*... (8.1.8) 
jflfe or 



(8.1.9) 



where p Is the reward earned when the process sokes a transition frea 
i to state 3 (• oo<v >* °° )♦ 



Let 



K» n n (QAM) 

m 

P 
and aasRsae that P has the oatvls beta dlstiibration v&th pas>sa*stap H» 



Tfcas» s and y ar© independent randoa variables, each having the tmlvsri&t© 
beta distribution. It la to ba noted that equattons(8.1.?) and (8.1.3) are 
valid fop all Hj/^ except a sat of measar© aero relative to the aatrix 
beta distribution. 



8.2 ffirpOTWffirtarta feflflftri.'mtat 

It will be convenient to use the bypergeometeie coefficient, (s). , 
la the equations of sabsetpant sections. This coefficient la defined her© 
and so&e of Its properties are derived. 

lot x be any real ssasber and Is any nesmegativ© Integer. 1he bgrp@p» 
geosetrlo eooffleleKt Is defined by 



(x) ■ x(x+i) ...(a*k«i), 

fc 


k^l«S t «.« 


(8.2«la) 


o 1* 


fee® 


(8.2«lb) 


Xf a > It la dear that 

Pti&k) 




(8.2.2) 


and in the oaee x » 1» 






<l) k « Set 




(8.2.3) 



gj.g.1 If (x) is the b^pergeeaetrle coefficient defined by 

■a 

(8.2.1), then the foltaiBg relations heldp 

x(x*t). * (x).(stit) (8.?..*) 



-213» 
«(«*> « (a) « (8.2*5) 

(as) # o (2s) (arte) (8*2.6) 

*(*►!) * (x) (x*)(x»ktl) (8.2.7) 

k*i k 

(sV(s»k) ■ (as) ♦ (8.2.8) 

* v k+v 

. Sguations (8»2.4) a&d (8.2, 5) fbllov by v,i34t4ag 

S(H^1) » S(2S*1) ••.(25Sk) « (a) (j6»k) e («L. f * (8.2.9) 

Eqaat&on (8.2.6) follows dl?<eet3^ £K*a (8.2.4) and (8.2.5). To obtain 
(3.2.75 *» as® (8#2.6) and (8.2.4) to obtain 

*(xt»l) o xCs»i). (*»!fi*l) 

k+1 * 

» (35). (atir)(s»tetl>. (8.2.10) 

k 

Sqaat&on (8.2.8) f&Oa&s b^ dipeet 3xpaBslon» 

(s*fe) « s(x+l) ...(**&-! )(#*•&}... (aMefrv-l) 

IS V 



• (s) • (8.2*ii) 

te+v 



Q.E.D. 



8.3 

W first «nsl^ the «pacUdv^no of pi> } . fl^ tha fcLnoaia 

k 

theopara to espsad the 2botos> ($»s»y) of ©^aat&on (8.1.7), 

k te 

(S»»»y) k ■ E < v )(-»)* / (HO t (8.3.1) 

^0 kc0»i»2»»" 

e*s»y£-t 

^9 oaa tsite 



•2i>» 



■off 1 - 






* 2 £ ( " )(-t) E CSf^B C«W0*^3. (8.3.2) 



Boy © » ©01,2,, 



yaw>"] - ^ B) 



b> L ^ iM <t-«> BH *" 1 < 



8(19*0, a»i) n i _ 

o b «— ] .'ft 11 (8.3.3) 

B(a,n) »>n (»w»tj 



and 




(mX, 



(8.3.fc> 



Hals 13© have 

//*) a A-i fe t, v (a), (p) 

12 ^ &=o **© v (amfi)^(p*<a) v 

Ifce following s-eearsresss© relation tMdi fbllous toaad&ataly ftpoa (8.3.5), 
is of use for ©©apatlng eaes@sslv© v&JUies of SC$j£ ]» 

A<£ At. tMkVI ,n " iiiwiinnHoiiiii i . nun 

** (a«^i^ v (p*q) v 

/*»U2,3,... (8.3*6) 
La a eiallar teafalon as eEpg>essi©sa fos» &£p|£ 3 *© ®as!.V te&wi, 
using (S.ft.8). 

21 &=*> mco v y 2 



•2JA- 

/*•! k . (®), (p) 

« 2 £ ( k )(-i) v ^ v ^1. (6.3.?) 

R» purposes of eaapat&tien* tro have the peeosveaoe relation 

SC£ A+1> ] ■ *#>] * * (0<-« V "^^t ,<8. 3 .8) 

21 ^ MB© (©«») (afrq) 

S&© derivation of (8.3* 5) and (8.3»?) depended tspon the fern of the 
expressions (8.1.7) sad (8,1.8) . Siiailap egressions ommt be obtained 
for p^ 1 ' and p^ • Ibna? the diagonal eleoents of the wean n»step 
transition probsfc&HtBf wata?ix snst be ootapsited tea the rePit&oas 

Bt^&h • tr&^l (8.3.9a) 



and 



Ett^l « IpBC^]. 



We saw verily that BtfK'] satisfies the peoupgive eqaation (&.1.2). 



That is, «e shall shs* that P£g'(M> ■ ^g 3 satisfies 



tshsre p. „(K) is the espeoted value ofp ^hsei ^has the distf&fcat&oa 

on**" * 5 

W (P I M> and where T a .(H) is the paraaefcar mt&K M idth its (i* J)th 

£lp 8 I 83 3.J as IS 

elegant increased by tsnity. 



Pjg^ a Af (8.3.1W 



WJP ■ *q » <8#Mlb) 



the sight side of (8.3.10) is 



ami 



Cl 



■a kMk V! 

atnfri 






-215» 
z 4 -* k k 



k 



k 



.v 



(B W p > 







• (3.0*12) 



SJa&ng (S.S.fc), eqjaatioa (8.3.U) eon be sasAttea 



a 

SinoQ 






p*qH> 






a+n+ltk»v 

(8«3«i3) bsiooBsa, upon apposing (3.2.6), 



njHW-1-rtc-v 
(8.3.13) 



(8.3.t*a 



am 



&bQ vaO (start), Cpfr^L> * 



/«-! k is 

te"0 \s»0 v 



H*WV ^V 




(8.3.15) 



k Is fcfri 

Letting 3 *» \*1 in the l&vst sea sad rating that ( J*( v )°( ) §> 

ss© obtain 
©*n koO (jam**). «d v '(»t) , , ,,, , 




(pK, 'k*l J 

«Moh* upon XetUag 4 » tell and ooHeetang tons* is pj* (Jp» •• 



(8.3.16 



y©ga3j?®d. A sSMte da*4mtiea &&m that pJJT'OO* aa defined tey (3.3-7) t 



«2t6» 

satisfies (&.i.2), 

8.** Expecte d Va3a© of F o£ H y6, 

in i?—— lrniir>i> u«c>Ma aaaw wa mn ■in" i»">™ 

Using (8^i»?)j tre have* for fixed P, 





fill ? fr'"* 1 ^ J* 

PJV> • * * S (l-as-?, (8.4.1) 

and* therefore* 
With 2 ) -^ ^ ? <^>M> V s C?*3 s DftMO* 1 ^]. <8.*.2) 



* _ BCateta+2) (n)* (a) 

Ef3f (1-30*3 « ■ ■ * ~4- j *i < 8 ^ 5) 

o^OfigSp ... 
nee have 

ECCEWI- , * £ 2 £ (^X-l) ***** » 

12 (am) 2 J«© \saQ vao v 




Sto&lfipl^t by eqaatiora (8«1.8)» 
el 



» „ m «• (-rt^C^ 



•217- 
«e hew© 

«K"tf i - * " 4 " J * <^<-» v ( " Ws> V . • 

*2 21 ^ jaO fenO vbO " (art***) -^j^~ 

/*»l,2f»« (8.4.8) 

T&e easa astbod can fo© me& to dssdv© sawm gsssejr&t pso&iot soasnts of 
<fc© fete BO^^l. 

Wa not? obtain agressions fop th© assns and ppo&xet asoants of 
X.ffJ ■ < 7Tj9 31) • stlvw C38]» treating the special ©as© tjfeere y is 
knoon and 3£ has tits beta dasts4feat&G«t> kas slmm that s£ ?F ] is a Gaussian 
bgflpepgeaasts'iQ JteBstlQn. 

Sy Theorem *.2.5* /&> SCS^?] » EC in] as*! using eqaations (8.3.?) 

and (8.3.5), ra© laaadlatsSsr to© 

oc k . Cs't (p) 

B[fir]« E E ( k )(-l) V ^ »»* C8.5»i) 



Tbsopsa &.2e5 lopM.©© tbai th© series (8»5«1) and (8.5.2) both ©onfferge^ 

Wo shall ahsc? t&afc they ogem&gQ oondlttona2^r» KegXoot&ng th© constant 

aa&tlpli©r •«». v and noting that O & C,«_,,)» th© series of absolute 

ia*n 
Raines corre spo nding to (8.5.2) is 



Srd©3^i [17], oh* 2. 



-2*8- 



a> k k (a) Cp) oo <x> uAt, (sa). (p) 



fee© x*>0 v i'ia^ijir' "Tp^)' " v=® IssO (jasS l i'i' ,ll ('p*q) ' 



\»0 k«b late*!} '(p*q) k? vl 

k v 



s F 2 (&*ffl*p»n+n*£ p ptqg l»i)» (8*5*3) 

tihere P ? (ojp,p» t Y»V*J s^f) is Appellee see©nd Ssgrpeygessaefcs&Q ffcnatiea of 
too aygsaents CS]. Sines Fgte^p^vtY* 5 *»F) ^i^es-gee sshsneffer |s| + |y| > i* 
the series (8*5*3) diverges and, thep©tee 9 the series (8*5*2) a>n?©?g©9 
ean&t&onalfy* A sissOar pgoof establishes the ®3£s3iti®nal eeswergonee of 
(8.5.0. 

It is easily TOdfted that E[ fh] and E[ ir 3 satisfy equation (fc*2*tiQa). 
Let 7f.(K) o 2£ if 4 ] (j=«i»2t)* Then it mst be ston that fF(g) satisfies 

Tf 5 (M) m Z iy$ M) ^(M). J*i,2 (8*5*0) 

We shall consider the ease 3°2$ the papoof ft>r £4 is sfcaUar* 
Hbr J » 2» the s&g&t aid® of (8*5*0) is 
co k ., v « ( a )» (p) ^« no (»), (p)», 

^U*» *» HnlfcOt « u im rUMii i um ,f i i .w iiiii ii mim ^ tarn <damM i»ii i, i ■ u iM iwn n myi » ni i r « r ii> ) ii i 

feao vao ^^ (ama-I) £p»^} a*n p*q (a*a*i), (p*jHl 

«JbL* £ S (\j' '**' .■■ nam, mq i | i> i i i»nii« U i jubi i j i i i mun i 

m a»n tsaQ ^»0 (a^i)^ v '^'vt-i 

% taaing (8*2*6) we see Ifeat 

W vtl * q(p) v „ H<W*> . <*>* {8 - 

eBawiniMi ■ ■mm mi »yi i w i I 'm wm h imi i iih i m n ir » i»» ■ i .if i n iii i ii »^ »m«i 9 \ XJI *w 

and, therefore* that (8*5*5) ie oqoal to T^JH)* 




BNKwt.24, K- BQ^f&^-BE^V mi «e obtain^ 



g&OoM&x^ ecpatioas toa (8*<t*6) » (8*&.*0» and (8***.8)» 



? <^><-i>* °V» Wv " 






«^frj- 



C«a) . . <p>. 




(8*5.?) 



(8,5.8) 



}»te»v 




^^ PO kP© moO 



.« (8.5.S 

eas&@o (8.5.7) - (9.5.9) «r© ej^tteaalSy oswsisgeat. Wa 

mnofcpaio the pseoff &>s» «pat4en (6*5.8) . Qy 1heor@B &.2.6 9 the <tefc&® 
inttzdte seriLos (8*5*8) oeswssagos* SsgXsstfoag tfe© osnatet ssaltlpa&e!? 
(a)* 



(sm), 



9 tho oors^spossSlBg ses&ea of ebso&at© to2u©s la 



«0 CO $Hs 
ESS 



V 



(B) 3*.v (p) v 



smw-S) . . (pfrej 



j"0 tsa© voO 
Bsteg (8.5*3) i9» csa tnlto aqoatte* (8*5.10) as 



(8«5*i0) 



Ul^PiBm*, pj^i 1,1) ♦ 2 C S <*«) .,,**?> v , , , f (8*5*1*) 

* j»t IsoO voO W^JIiT (pfcjF 



(3*5.3) Is ®m&items&}& wmapgeeA* SfcaSta? pm»£te 
she*? that (3*5 *?) sod (6*5*9) also ©amrsspge cocs31tlotaa22^'. 

It can b© Trapf&aod tfeai EC fi^ ttA sat&aHos eqaa&on (**2*55&)* Xt» 
alga/beia ie otfle&gJafcg&OTsad bat taboos a*3 i£&l sot bo s*£«od&ed hope* 



»220» 
3*6 Pragm a jjalda. 

fta aspeotad ga&a of tba te«srfcat© J&sto? otjs&n $Qas&te©& la <&lg 

obapfcep is* t$? (^•^♦3)t 

S 2 
1(H) a 2 2 w*<T (H)> p. M r 44 . <8.6.i) 

a lot M i ij M * ^J 



If Vb» pssaagd eats&ss P. Ie giTOB fy* (8*1*9)» ^h® s^pacffcsd gain is 

13 

5(H) « ? S (5)(4) V f &db ^W^vH 



^gggpiqg C8.2»i>) 9 (8.2*5) and (8.2.6)* tso to® 

g(H) b JLS £ (*) W) V " Inv P *1 c | (Bltel(v)4bw gL, 3# 

(8.6.3) 

It is ciaeaF that (8.6,3) soot oosrtfeapg© ©si^tta®^. 



8,? 

starts in state 1 is gtaft ^ eqo&tton (£>«3«?) ©0 



M 



*° JL t* & A 5^ CUD) 5 J^) ^* **M <e-7*i) 



S«„(M) e 

/■I 



s l5 m • j[ p' 1 p£7C| 



«22i« 



R» Ci^J) c (1*2), we ba^®» ty eqpattan (8.3*5)p 

(bX (p) 



3 f *<B> « £ ^JL 2 £ ft 



r to»v 



pa 



oe> oo 

* 2 £ 
/«a0 to"0 



^ 2 



& 



v Wt^. (p)« 




(8.7*3) 



SSLoos 



n 
am 



k 

S 

M»0 V 



CnHfrl^CpH}), 



k fc 



a |S C2Ci*^) 3 ^ti (8.?.fc) 



W© hS3© 



op <» 



©m /'ssO \gaO r 



£ ( V 'M> 



fe»v*^'v 



[aHfttl 



oo CO 

ip £ 2 

Had ratio test stass tbat 2 (/*+!) ^ eeac?sgg9G t hoas©, t» 
cfcasge tha £&rat fcao stssmtton ©perateH® in (8.7,3) to obtain 






(8.?.$) 



k 



(ok <p) 






to»v 



,. (8,7.6) 



SSQglootS&g the oofistani cwltip3ie? t tbo seafi©0 of absolute ^g3x»® 
(5Dsi?©Qpoadiqg to (8.7.6) i% tspova JBfeegehaag&ng the osdep of gnmmfltto>» 



« oo (feW) t(t9>. (p) 

2 2 ,„, v ,„ , 

**> 1<p0 kt»i (E«««-i) k (pwa) v 



ft? 



a FgCiom^p, om>i $ pHjf 0, £), 



(8.?.7) 



tftttro Fg(^ $ p» f v»V 9 l £»?) is App@21 8 s secjend bppeargooaDts&o tms&m of 



•223- 



two variables £2]. Appell lias sho«n that the series (8.?.?) converge© if 

0£$<^ and diverges if j}r<$<l. The case P * "| has not yet been 

i 
investigated. W© conclude that (8.?. 6) converges absolutely for 0£p<g 

and, since Theorem &.3.2 implies the convergence of (3.?.6), that the 

series converges conditionally for ^^p^i. 

For (i,j) ■ (2.1) ve use equation (8.3.?) to obtain 



K) « lip 



c* 



rkx n k/ *\V 



W IH, <P> *H 



S 91 (K) «W £ S {%) £*(-!) 

k*0 v** (wfn) k.v (M) NH.i 



(8.7-8) 



the series 



absolutely for 0^£<f and conditionally for ?<$< 1. 
cases are 






(8.7.9) 



and 



(8.7.10) 



The expected discounted reward starting from state 1 is 

_ 2 2 2 

V (M) » £ p.. (H) r, + £ E S.^T. (H» p (M) r . (8.7.11) 

Using the reward matrix (8.31.9) and the formulas (8.?.6) and (8.7.9). 
va obtain* upon collecting terras* 
_ ma + nb np •* k 



V\(fl) a ■ ' * — - 

1 a (l-P)(iM-n) (ic 



t. i (^ p k («i) v 

k*0 \*=0 



(®). (p) 



(aattH-1) (p-wO 
k-%» v 



c(p*s>) + dq 



p=w$*>\» 



a(srt-k-»«) * b(n+i) 
»ms-i*k-\» 



(8.7.12) 



In a similar manner wa find 



-2?> 

• cp + dq p ook. Wta.J p )«*i 



(i~$)(p*q) l-{3 k«0 v»0 (ntn) t (p»q) „ 

[a(afk-v) + ten e(p*-s*i) + dq 
~ ™~_ - . (8.7.13) 

s>fmk-v p*q*v*l 

It can be shows that Vj(M) and Vg(M) satisfy equation (**.3«9). 

8 * 9 a Genera^ mU^ ^fr ft Hwflft frX 9.tet 

N. Z. Shop [37] has considered a gams* theoretics aodel of & two-state 

Karkov chain "&dth alternatives and rewards. He shows that under certain 

©ireuastanees each player should act so as to maximise his expected one-step 

transition reward. This result is generalized here. 

Consider a ts3©»state Harkcv chain with K. alternatives in state 

i 

i(i»i 9 2). Assume that the rewards depend only on the initial and final 
states i and j and not on the alternative used in making a transition 
from i to j. Assume further that the reward satrise is 

£ ■ &£ 3 (8.8.1) 



tihere e if r is any real number* 



k 
P ii 


a 


V 




k 
*1* 


8 


r + 


A % 


k 
21 


s 


r * 


? 



Is&U •••• K« (8.8.2a) 



fe»l» ...» K 4 (8.8.2b) 

1 



k»i, ..., K (8.8.2e) 



r k « r*£ + A-. k"i» ..., K (8.8.2d) 



2? * V ~2 y 2 



We 



require that ^0 and A_ ^A.^- 0, 



k 
Let t ^Lp. be the matrix of alternative transition probabilities 

and let ff have the prior distribution function H(flt). If F^(p\f) 



*. » 



-2»- 

is the marginal distribution function of Jtsi)* it is assumed that, for 
all ieE, F r (P|t) is continuous on the boundary of >6^. 

?be expected gain of th© system under th© policy J£ is g(«2U 'f') • 
Suppose it is desired to choose a policy, SL , «hieh mssltais&s the 
expected gain, 

iCi'.t) « Jg ^*C&*>} . (8.8.3) 

We shall show that, with the regard structure (8.8.2), it is sufficient to 

solve the corresponding deterministic problesa for <?(**) e £[£ \ *^ 3 

and that the optimal policy , j£. « ( <*" , 0"l ) f is determined by the 

1 z 



(8.8.fc) 



*** ^ tT«p ba the conditional total expected reward in n 
transitions under th© policy J£ when th© system starts from state i (!«*i,2) 
and gtsJ a p. Let 3* n '(x»t) b© the corresponding unconditional 
expected reward, 

o[ n) (£L^ ) - J G< n) ( itWF^ (P \ t ). (8.8 5) 



JC€£ 



Jtan J&aSal. Fop n^l»2 ... and all J£.®2 f 



G™(£*+) - ^(J&t) * A t . (8.8.6) 

Proof- He show that, for all PeJ+ 



*. • 



S ( 4 R) <r.|) - G< n) (£,P)* A t i n»i,2.... (8.8.7) 

1 a 2 * * ^cS 

from ehioh ©qiaation (8.8.6) foUsera, sine© ^e»^ - ^1 is a set of 
measure aero relative to F^ (P 



Let f .(u,n) bo the expected nanber of transitions frora state 1 to 
state j in n transitions when the system starts from state vu Then, for 

an pc// , 

i»i,2 (8,8.8) 

£«£ 

If P is represented as In (3*1.1) and the eigenvalues of P are a »i and 

?i «i»38»yt we can use the spectral representation of P given bgr (8.1.&) 

together with the expression for f» .(u # n) given by equation (6.1. 15) to 

obtain 

n 
4 n> <£»£> - a^^ff * g C- e **4j - (X-y)^]. (8.8.9) 

" 2 

Since P>.0 and A., i A. i 0» 
v 2 1 

f?0 f~\ t * A** 

G i <2r»2>-^ ,<£•*> 4 — -JL (»^i)A f 

* ■ z e 1 - A ? x 

-fc g S. A 4 • (8.8.10) 

1- A g 

If 0±K ? * i 9 than 

-A g if *? rt A " ft * A it (8.8.U) 

vhereasa if ~i^Ag<0» 

«,* Vr A s, , a < * -*„ a., (i + v + a/ «• ... ♦ ^IT 1 ) 

2 1 « X~ * ?! 2 2 2 

z 

4-A 2 A t * A & . (8.8.1?) 



In e&ther case, we obtain (8.8.7) • Q.E.D. 



&&£ for l a i»2 and £■ e£, 



?? i o} n) (£»f)« !(£> + > 



SLaea 



_(n) 



2 ? . 



<C '(£,?) « S E f^ 



o»a p«i 



op* 



(8.8,13) 



(8.8.ik) 



equation (6.1.1*0 yields 



2 2 



a»i 



BMKJJrw 



°t (c ' t) 'it ^i*»p^, B 2 ^¥i 

ia 



(8.8.15) 



(8.8.16) 



Lot c > bo given. §y a trivial extension of Theorea k.2.5 9 there 
esdsts on integer » > sueh that, if k > v, 



"lO^i- 8 ****' 



(8.8.1?) 



then, fbr n > u, 



1 



a»l 



«r <*(&) 



VV« 1- S £ E * [ V 






-c e 



for n sufficiently largo. thus, 



1151 * S "c^^' 1 " We** 1 

n-*o» n k=0 - «p - » 



and, fc^ (^.^.^), 



Ha 1 G, Cn) (£,+) a i (£**-) 
n-*x> n i "" ? a * — » 



(8.8.16) 



(8.8.19) 



(8«8.g0) 



Q.E.D. 



%mSO&M*2 Let £ ° (<r f <r* ^ to a poiioy gueh 



$«*>• frC.^ {&"•>}• 



-22?- 

i*t,2 (8.8.21) 



fhasi 



°# — _ o 



2 l) <r\t > • * ♦ e + pi! (t)A 2 ^ s) (a;>*% < M *w 



g(2%t)> g<Ei*). re£ (8.8.??) 

We &?3t establish bsr isKiuetioa that 

0^ (<£>*)* Q M {£.+). i«i,2 (8.8.23) 

G 2 <2T t t) * p + (. , Kgj 
Asauae (8.8.23) faold9 for n. $b? i**!, 

= r * pjff )U, * B< n) (-:+) - o[ B> < <r>)] ■> Z^hzl+h 

(8.8.25) 
Siaee 2T *a an optisal poii^ fop a transitftoa interval of length n 9 we 
have, for all <r cS, 

4^ S) <2:.t) £ r * pJn-K A 1 * o| n) (2::t) - G { *\rtf )3 * o^fif.t ) 

(8.8*26) 

sad* bgr (3.3.21) mid Lesaaa 8.8.1» 

MjjjCt) - P$t)l CA t ♦ 0« n) (2?,t) - o[ n) (2:>>] 

&0. (8.8.S?) 

aisdlariy, 



.228- 



B^cc:*) - r + ? ♦ »Jt) u g ♦ S^ctf) - S^t*)] * S^t*). 



(8.8.28) 



and, sine© A~^A 4 » 



B^crrt)-^ ^^) 



iO t crcr (8.8.29) 

pswing the Induction. 

Equation (8.8.23) and Lnm 8.3.2 tog©thes» Imply that* for all <r eE 9 

Hi 1 s<n),_*. 



|(o-> )s Hm I gW(rV) 
n-*°° n i ° 



l (n) i 

1 
- i<r**). 8.8.30) 



n^» n * 



Q.E.D. 



CHAPTER 9 

CONCLUDING REMAHKS 

In the foregoing chapters wd have described a ferial structure for 
certain broad ©lasses of sequential sampling and fixed saasple-sis© decision 
problems in a Harkov chain vita T snknoisn transition probabilities* Since 
there is ves^r little thsoty in this area west of our efforts have been 
directed toward answering questions of existence and convergence. For 
this reason the portions of this report *&ich deal with numerical 
computation set forth the obvious* bat not necessarily the seat efficients, 
raays to approach problems of calculation* It does seen olear B however, 
thai, for problems with a large nrasber of states in which a blch degree of 
accuracy is required, we oust think in teras of hours, not minutes, of 
computer ttae* This Is not to say that the asyesian method of dealing 
with Markov chains with uncertain transition probabilities srast be abandoned 
as impractical. But it mot be recognised that, at the present state of 
the art, the Bsyestan treatment is probably nost practical Sot problems with 
tx*o or threo states, loos® prior distributions, and large differences in 
the reearda associated with different actions* As pfobXeas tend to differ 
ffrosa these criteria, the dedsion-maker must balance increasing computation 
tiese against the required accuracy of the solution and choose an appropriate 
approximation* 

there are numerous questions of iaaediate interest which ronsaln 
unanswered, some theoretical and seme numerical* Kas^y of these are listed 
below* 



-250- 

i. Certain ®m$? bounds «®r© derived in Chapters 3» 4 t and 5 
depend on the discount factor* p ft bat not on T » the parameter of the 
distribution* Ihes© bounds should be made tighter for epeeif5.e prior 
distributions by including factor© «hioh involve \ ■ 

2* The rate of oonvergeno© of the successive apps^sitaations methods 
developed in Chapters % U, t and 5 depend upon the choice of terminal 
functions* Classes of tendnal junctions sMeh accelerate this convergence 
rate should be investigated* 

3* The analysis of undisoounted adaptive oontrol models by letting 
p — * 1 in the corresponding discounted problem may provide a writable 
approach to a difficult matter. Hie remarks of Section 3.6 are relevant 
in this connection* 

fe* The question of the uniqueness of solutions to equations {k.2«Wj 
and (&.2.55) is of considerable importance for the calculation of the 
means and product moments of the steatfy-atat® vector J£ «hen a method 
of successive approximations is used* the problem of the convergence of 
the approsimant 2L(n» t)» as defined bv equation (J**2*fe2) with the 
terminal taction (k.2*5i) # is also of isapsrtance* 

5* In the terminal control models of Chapter 5 it is necessary to 
evaluate expressions of the ftersa 



V" 



and 






-23X- 

At present the only method of finding the aaada&sing paliey £" is by direct 
search over the elesonts of £• &&re efficient setheds of finding (£** should 
be investigated, approximations to 2T * of 3h@ sort described in Section 5«*t 
should also be studied* 

6« A fbresal analysis of the undlscounted terminal control aedels 111 
and IV, «hi@h -aes*© introduced in Section 5*5t should be carried out. This 
analysis would esasine such questions as the existence and uniqueness of 
solutions, the convergence of successive appros&aation sethods D and whether 
a tere&nal decision point la reached with probability one in an optAiaai 
sampling strategy* In this regard it is to be noted that equations (5* 5* 2} 
and (5*5*3) can be sad© sore precise by replacing the espression 

2S \^(£.f >} 
bgr the expression 

where «L ( £■, t) is the espooted rjflfoffifle, 2&2a& °^ starting the system in 
state 1 and operating it indefinitely tender the policy £ when the prior 
distribution function of f is H(£{t)* MetSssds of esraputing vAg**) 
have not yet been studied* 

7* There are well-known difficulties in assigning a cnltlvariat® 
prior distribution to the elements of % in such a Banner as to accurately 
reflect the doeioion«4Baker*6 state of knowledge* It would be of considerable 
interest, therefore, to investigate the sensitivity of aosse of the aodels 
of foregoing chapters to relatively assail changes in the prior distribution* 



" Cf. Howard [22], Ch. fc* 



-233- 

to addition to these and other iEsaediate questions *tiich aria® In 
connection with the research reported in tide stufy* there are several 
fairly obvious directions in which this research can be extended* For 
osaople, Eany of the results and techniques developed here should be 
capable of extension to decision problems in a serai-IfoFkov chain in which 
both the transition probabilities and the parameters of the bolding-t&ne 
distributions are uncertain* 

Here general stochastic processes should be amenable to Bayealen 
analysis * alifaeagh different techniques than those utilised here say be 
required. The Welner process* for ©maple* can be analysed with the 
existing Bayesian theory for aossaal processes* 



APPSNDXX 4 
GLOSSARY OF S1TKBGL3 



8(p 9 q) 

B N (ffl) 

P 



eevC*] op covC* ] 
©iCn.t) 

\(S.t) 

sC»] or EC- ] 

*.i 

f^Cu-a) 






£2fifiQEUBC» 

Beta function* 

Generalised beta function* 

Discount factor* 

Sampling cost vhen systea is 
observed in state i* 

Expected one-step sespling coat 
*jhen process id in state i and 
alternative k is to b© used* 

MasiiniB sampling oast* 

Govsrience operator* 

Srroy of nth successive approxlsant 
in adaptive and terrains! control 



Defined 



f 



(H) 



Srror of the terminal deelei&n 0% 

Expectation operator* 

Sum of ith row of transition count. 

Sum of ith column of transition count. 

Number of transitions from state i 
to state $ in n transitions %&en 
process starts frees state u. 

Expected value of fj «(u.n)* 

Multivariate beta probability 
density function* 

Sfenstaadardiaed isultivarlate beta 
probability density function. 



i?0 

38 

118 

119 



52 
127 

m 



151 

153 

155 
170 

176 



►23!»- 



(N) 
fpy ( P | u 9 n,M) 

JK.M) 



(fla.% 



P_(P It) os>F(P(+) 



r s 



p g {a»Piie B »Y»v"is«y) 



g(p> 

IB 



g<t) org(r»t) 

H<i\t) 



nifMilLirtff 


Defined 


Beta*Hhittle probability sass 
Amotion* 


19* 


Betfr*Hhlttla»2 probability 
mass gfemtion. 


195 


Hon3tandax4 beta»Whlttle-2 
ssass function* 


196 


Matrix beta probability density 
Amotion* 


180 


Ronstandardlsed smtilx beta 
p3n0babS.il V density Asnotion. 


186 


Hatvisc beta*! probability density 
Amotion* 


190 


Whittle probability mass Amotion 


153 


Whittle! probability aaas 
Amotion* 


161 


tMt.il©»2 probability mass 
Aanotion* 


162 


Transition ooont* 


151 


Multivariate bete probability 


m 



distribution taetion* 

Marginal distribution Amotion 
of the K rows of jf specified 

ty a> 

App@H»s second !^p©ygeofBeti , &© 
Sanation of two ergoaents* 

ftaslly of probability distribution 
Amotions* F(F \t). 

m 

Expected retsard per troneltion in the 
steejfy state, or process gain, whan 
p» P. 

« o 

Espaoted gain under the policy <£*" 

Gassaa function. 

Probability distribution Amotion ffer 
the generalised stochastic matrls j « 



8 



3^ 

108 

33 



8 



pj"' n -stag, transition probability 

whan PsP« 

«s a 



r lJ 



P. 



ij % ' • **J 



lander the polio? 2"» 



vector. 



-2T(n 9 *) 



2:3iy^: 



-23> 

Mflttdnp 

MMMn 

#~ Fatally of probability distribution it 

functions, H(f (*). 

-^^U l£ ) likelihood function £©r the sample a . 11 

£ ° CP4 J A K x K generalised stochastic matrix ? 

Pfej) ■ P « [p ] An K x K stftofaastic matrix consisting 7 

** of K rosm of £ specified bgr £-* 

p* »»». * *• -w, p. - 

B SS 

9f«(T) Expected value of t£ «. &i 



J^W^OEVt') Bspected n-step transition probability 69 



2T» (7rr,... t 7^) Stead^stat© probability vector 73 

1 of an ©rgodte Markov chain when 

?• P. 

8 B 

(^(^••♦••^W) Expected steady-state probability 78 



(t?H (n* t* ) t • • • ^(n.*) ) 011© nth saccessiv® approximation to 8? 

7/^( H Expected vala© of *t^ fK. 9*> 

<p> (a»v t ft e g) Set of all transition counts Qt 15$ 

sis© n which start in state u and 
end in state v whan p is the matrix 
of transition probabilities* 

^(ta^nt g) Set of aH transition counts of sic© 152 

n which start in state ta when g 
is the matrix of transition 



<*L (a*P) Set of all transition counts of 162 

88 sis® n when g is the matrix of 

transition probabilities* 



.236- 



*qW 






(» 



%j(r> 



Defined 



13 of sis© n %sbtoh start and end la 

tbe ©aiao state tsfcan P is the 

mtiAx of transltion^psiob&b&litieQ*. 

• c 

<P £p (n*P) Set of all transition octants of 163 

83 s&se n which start and end in 

different states whan P is the 
matrix of transition pSrobabUitiee* 

4 r K (n»v,n) Set of all transition counts of alee 192 

n which start in state u and end in 
state v when the aatrix of transition 
probabilities is positive* 

*£«(u f n) Set of all transition counts of 192 

slss n which start in state u when the 
aatrix of teans&tlon probabilities is 
positive* 

^An) Set of all transition counts of else 195 

w n whew the matrix of transition 

probabilities is positive* 



Set of all transition counts of sise 195 

n which start and end in the same 
state when the matrix of transition 
probabilities is positive* 



*ft nP W Set of all transition oounts of sise 195 

1 n which start and end in different 

states «h«a the oatfix of transition 
probabilities is positive. 

T Oenerio syc&ol for the peraseters of 8 

a probability distribution function* 

¥ AdaAssable pararaeter set* 11 

True state of natsire* 2 3 



Expected one»step transition regard ; 

when the system is in state 1 
and alternative k is to be used* 

a© n-step transition probability 8* 

tinder the policy £* «nan <g is the 
true state of nature* 



*37- 



s< n) <r»P.f) op 



«|> '0*f) EXpaeted discounted reward in 71 

* n transitions when the system 

starts in staid £« 

» a ^4^ ^** K x K matrix of one-step 7 

*J transition retsards* 

R (£) s Ho [?y] th«Kx N matrix of o»e»step transition ? 

rewards consisting of the K rows of 

<R specified fey £ • 

H Maximum element of 2 • 51 

r Minimum element of J . 5% 

H* The element of t with the ^5 

largest absolute value* 

r the elaaent of $. with the smallest ^5 

absolute value* 

FL.(a) Renge set of a random veotor t&th 176 

the nonstandardlaed Btiltivariate 
beta distribution* 



"M 






2J 



Range set of a random matrix tAth 185 

the nonatandardl&ed matrix beta 
distribution. 

Set of all K x N gssasralised 3 

stoehastio matrices. 

Set of aU R x N stoehastio 8 

matrices* 



<#L Set of all R x R positive stochastic ?fc 

matrices* 



, a Set of all R x 13 stoehastio matrices ?** 



with elements in Uie closed interval 



HT e (*j# •••»*}$) JteHoy vector* 7 

£ Set of all policy vectors, <£"• 8 



-238- 

Defined 
ISBflflftOK oi ^ P ftfffi 

$*( t ) FaraBeter of the posterior 12 

3 distribution of J when the 

parameter of the prior distribution 
is f and a transition fro© state 1 
to 3tato j under alternative k is 



T i1^' Ftoaoetdi? of the □osterios' 12 

J distribution of g when the 

pararaeier of the prior distribution 
is H" and a transition froas state 1 
to state J is observed* 

\ <.(**•£:• *P) flararaeter of the posterior 130 

J distribution of g when the parameter of 

the prior distribution is T , the 
systcn starts in state i and is observed in state j 
after n transitions under the pelley <r. 

v. ( ^) ESaaeeted total discounted reward over *!0 

1 an infinite period when the syeten 118 

starts in state 1 and an optical 130 
sampling strategy is followed* 

\ dh ^) 'She nth successive approximation b6 

% tov 4 (t) 123 

* 13^ 

\( V* v ) Espeeted total reward over a period 1"5 

1 -u&th terminal operation phase of Iftc 

length v whan the system starts from 
state i and an optimal sampling 
etaategsr is fo320&ed* 

\( ttX) Expected total discounted reward over W. 

* an infinite period when the system 

starts in state 1 with the policy £" 
In f oroe and an opUaal sampling 
strategy is followed (discounted 
process with seto»up cost)* 

e 
v Hlntaa of a set of constant terminal $1 

reward functions* 

V* Hsxi&an of a set of constant terminal 51 

reword functions* 

\(f) Sermln^ reward function. 

V Bound on the terminal reward functions* 4? 



-239- 

Defined 

S&&0. teufts& aa^sa 

V-CP) Ea^jeeted total disecraated raspyd 96 

over m^ infinite period when g » g« 
the aysfcaa starts £fcxxa state 
i» acd & lisad policy Is used. 

«■ «• 

\C £»'*') or \(t) Eapeotsd total disasantea posmd 97 

over en Infinite period when the 
fig^atea starts froa state i and Hie 
p&liccr £ is used* 

or 

V« (n 9 f ) lhe_nth stteoassiva spproxioatioa 100 

* to v t Ctt. 

*ar[»3 or vas£* 3 S» variance operator* ~ ■ 

x a (s ^ t ... t jt) a eas$&s of n * i states ©eeopied it 

^ * ^ n by a Karicov ebain. 



AFFBSmX B 

PBOGRAH VITERATEOfi 10 SOLVE 
E37A1X0HS (3.2.1) AND (3.2.2). 



RPROGRAM NAME IS VITERATION. 

RTHIS PROGRAM RECURSIVELY COMPUTES VALUES OF V(I» Tl» M) FOR 

RI*1».*.N AND Tl»S,...,T, FOLLOWING AN OPTIMAL POLICY. 

RREWARD MATRIX IS R AND THE TERMINAL REWARD VECTOR IS RHO* 

RTHE MAXIMIZATION IS OVER THE MUU> ALTERNATIVES IN STATE I* 

RBETA IS THE DISCOUNT FACTOR. A MATRIX BETA PRIOR IS 

RASSUMED. 

PROGRAM COMMON R »RH0»RDIM» IND»MU»N »BETA, IND1 ,LIST»T0P » 
0INDIM,P0L 

INTEGER IND»MU,N,IND1,T0P»S»T.I, J,K»MAXSP ,N1 ,POL ,V2 

DIMENSION R<500»RDIM> ,M( 500,RDIM> ,RHO< 10 > » IND< 100, INDIMJ • 
0MU(10),INDK10)»LIST(21000)»V1(10)»V2(10) 

VECTOR VALUES RDIM=3, 1,0,0 

VECTOR VALUES INDIMs2»l»0 
READ READ FORMAT INPUTl, N»S,T,MAXSP»BETA 

PRINT FORMAT 0UT1A, BETA 

RDIM(2)-N 

RDIM(3)*N 

INDIM(2)*N 

INDKD-0 

IND(1»1)=0 

THROUGH ALFAlt FOR K=1,1»K.E.N 
ALFA1 IND(1»K+1)=K*N 

N1=N*N 

THROUGH ALFA* FOR K=l ,1 ,K.E.MAXSP 

INDKK+1>»K*N 

IND(K+1»1)=K*N1 

THROUGH ALFA, FOR 1=1,1 »I.E.N 
ALFA IND(K+1»I+1 >=K*N1+I*N 

READ FORMAT INPUT2, MU(1) • •« MUCN) 

PRINT FORMAT OUTlEt ( 1=1 ,1 » I .G.N, MU(I)) 

READ FORMAT INPUT3* R<1»1»1) ... R <MU(N ) »N»N) , M<1,1*1) 
0... M(MU(N)»N»N>» RHO(l) ... RHO(N) 

PRINT FORMAT 0UT1D, ( I=*l * 1 » I .G.N, RHO(I)) 

PRINT FORMAT 0UT1B» < 1*1 »1 * I .G.N, <K=1,1 ,K.G.MU( I ) , 
0< J=1,1,J.G.N, M(IND(IND1(K) + I)+J)) )) 

PRINT FORMAT 0UT1C, ( 1*1 ,1 * I .G.N, (K=l ,1 »K.G.MU( I ) » 
0(J*1,1, J.G.N, R<IND(IND1CK> + I)-kJ>) )) 

SET LIST TO LIST 

LIST = 

THROUGH DELTA, FOR K=S,1,K.G.T 
THROUGH GAMMA, FOR I=1»1»I.G.N 
V1(I)=VMAX.( I,K,M) 
GAMMA V2( I)=POL 



-2ti- 



PRINT FORMAT OUT2* K» VlQ) 
DELTA PRINT FORMAT OUT3* V2<1> • • 
TRANSFER TO READ 



... VKN) 
V2(N) 



RFORMAT SPECIFICATIONS 
VECTOR VALUES INPUT1=$4I 10»F10.5*$ 
VECTOR VALUES INPUT2=$10I7*$ 
VECTOR VALUES INPUT3»$ ( 7F10.5 >*$ 
VECTOR VALUES 0UT1A3$7H1BETA *»G15.5*$ 
VECTOR VALUES 0UTlB«$6H0 M =»8G15.5/< 8G15.5 )*$ 
VECTOR VALUES 0UTlC»$6H0 R *»8G15.5/< 8G15.5 )*$ 
VECTOR VALUES 0UTlD=$6H0RH0 =»8G15.5/8G15.5*$ 
VECTOR VALUES OUTlE«$5HOMU «*10I5*S 
VECTOR VALUES OUT2*$8H0FOR T **I2»14H V(I» T» M) 

08G15.5*$ 
VECTOR VALUES OUT3=$7H POLICY, 1015*$ 
END OF PROGRAM 



»»7G15.5/ 



EXTERNAL 
ENTRY TO 



FUNCTION 
VMAX. 



(II* Nl» M) 



RTHIS FUNCTION RECURSIVELY COMPUTES MAX V( I 1 *N1 »M >*Y» THE 
RMAXIMUM EXPECTED RETURN IN Nl STEPS IF THE SYSTEM STARTS IN 
RSTATE II WITH PARAMETER MATRIX M. PRIOR DISTRIBUTION IS 
RMATRIX BETA. MAXIMIZATION IS OVER THE MU(Il) 
RALTERNATIVES IN STATE II. 

PROGRAM COMMON R»RHO»RDIM»IND »MU»N*BETA» IND1 ,LIST»T0P» 
OINDIM»POL 
INTEGER Il»Nl*I»N2»K*IND»MU»N»RDIM»J*INDl*T0P»P0L 
DIMENSION R(500*RDIM) »RHO< 10 > »RDIM ( 3 > ♦ IND( 100»INDIM> * 
OLIST(21000)»MU(10),IND1(10)»INDIM(2) »TM< 500*RDIM) 
1 = 11 
N2*N1 
Y*l.E-35 

WHENEVER N2.E.0* FUNCTION RETURN RHOCI) 
THROUGH ALFA* FOR K=l ,1 *K.G.MU< I ) 
MSUM»0. 

THROUGH PHI* FOR J*1*1»J.G.N 
PHI MSUM=MSUM+M< IND< INDKK) + I )+J> 
STOR=0. 
THROUGH GAMMA* FOR J«1»1*J.G.N 

SAVE RETURN 

SAVE DATA N2 »P0L»MSUM*ST0R>Y*M(K » I • J) ••• M( MUC I ) » I »N ) » I * J»K 

EXECUTE TR1.(I»J»K*M*TM) 

X=VMAX.(J*N2-1»TM) 



-3*3- 



RESTORE DATA K» J»I »M(MU(I) » I»N) ••• M(K» I » J > »Y»STOR»MSUM* 
0POL.N2 

RESTORE RETURN 
GAMMA STOR«STOR+(M( IND( INOl (K > + I >+J> /MSUM)*<R< IND( IND1 IK >+I )+J> 
0+BETA*X) 

WHENEVER STOR «LE. Y» TRANSFER TO ALFA 

Y=*STOR 

POL«K 
ALFA CONTINUE 

FUNCTION RETURN Y 

END OF FUNCTION 



EXTERNAL FUNCTION C H , Jl »K1»M»TM> 
ENTRY TO TR1. 

RTHIS FUNCTION EFFECTS THE TRANSFORMATION FROM THE PRIOR 
RPARAMETER MATRIX M TO THE POSTERIOR PARAMETER MATRIX 
RTR1.(I1,J1»K1»M)=TM» WHEN A TRANSITION IS OBSERVED FROM 11 TO 
RJ1 UNDER ALTERNATIVE Kl» PRIOR DISTRIBUTION IS MATRIX BETA* 

PROGRAM COMMON R»RHO»RDIM» IND#MU*N »BETA » INDl »LIST»T0P9 
OINDIM»POL 

INTEGER Il»Jl»Kl»I»J»K»IND»MUtN»INDl 

DIMENSION R(500»RDIM> >RHO ( 10) »RDIM( 3 ) »IND< 100 1 INDIM) » 
0LIST(21000)»MU(10)»INDK10).INDIM(2) 

THROUGH ALFAt FOR 1=1 »1» I.G.N 

THROUGH ALFA* FOR J=l»l# J.G.N 

THROUGH ALFA» FOR K*l »1 »K.G.MU< I ) 
ALFA TM(IND(IND1(K)+I)+J)=M<IND<IND1(!0+I)+J> 

TM ( IND ( INDl (K1) + ID+J1)=M(IND( INDl (KD + Il)+Jl) + lcO 

FUNCTION RETURN 

END OF FUNCTION 



APPENDIX C 
PROGRAM PHI MATRIX TO 

comm eqpauqh (k.i.z) 



RPROGRAM NAME IS PHI MATRIX 

RTHIS PROGRAM RECURSIVELY COMPUTES VALUES OF PHI(I»J*T1»M) 

RFOR I»J*1>...»N AND TisS»...»T. A MATRIX BETA 

RPRIOR IS ASSUMED. 

PROGRAM COMMON IND»N» J*LIST*T0P*MDIM 

INTEGER N»IND» I »J»S.T,K»TOP 

DIMENSION M < 100 »MDIM )» INDUO)»LlST{ 21000 )»F< 10) 

VECTOR VALUES MDIM*2*1»0 
READ READ FORMAT INPUTl* N*S*T 

MDIMC2)*N 

IND(1)=0 

THROUGH ALFA* FOR K=1»1»K.E.N 
ALFA IND(K+1)*K*N 

READ FORMAT INPUT2* M<1»1) ••« M(N»N> 

PRINT FORMAT OUTl* N»S*T» < K»l *1 >K.G.N» < L=l »1 >L.G.N» 
OM(IND(K)+L) >) 

SET LIST TO LIST 

LIST=0 

THROUGH GAMMA* FOR K*S»1»K.G.T 

THROUGH GAMMA* FOR 1=1*1*1. G.N 

THROUGH DELTA* FOR J=1*1»J.E.N 
DELTA F< J)=PHI.< I*K*M) 

F(N)=1.0 

THROUGH EPSf FOR J*1»1»J.E.N 
EPS F(N)*F(N)-F(J> 

WHENEVER I.F.I 

PRINT FORMAT 0UT2* K» F<1> ... F(N) 

OTHERWI SE 

PRINT FORMAT 0UT3* F(l) ... F<N> 
GAMMA END OF CONDITIONAL 

TRANSFER TO READ 



RFORMAT SPECIFICATIONS 

VECTOR VALUES INPUTl=$3I 10*$ 

VECTOR VALUES !NPUT2«$< 7F10.5 )*$ 

VECTOR VALUES 0UTl*$3HlN=*I5»4H S=.I5.4H T=,I5/ 
0(1H * 8G15.5)*$ 

VECTOR VALUES OUT2»$7H0FOR T»* 12* 15H PHI < I » J»T*M ) a » 
06G15.5*$ 

VECTOR VALUES 0UT3=$S24* 6G15.5*$ 

END OF PROGRAM 



-sw- 



EXTERNAL FUNCTION <I1» Tl» M> 
ENTRY TO PHI. 

RTHIS FUNCTION RECURSIVELY COMPUTES PHI { II t J»T1 »M)eYs> THE 
RPROBABILITY THAT AT TIME U THE SYSTEM WILL BE IN STATE J, 
RGIVEN THAT AT TIME IT WAS IN STATE II WITH PARAMETER 
RMATRIX M. PRIOR IS MATRIX BETA. 

PROGRAM COMMON IND»N» J»L IST.TOPtMDIM 

INTEGER Il»J»Tl»I»T,K»N»INO»TOP»MDIM 

DIMENSION IND(10)»LIST(21000) tMDIM(2)»TM(l00»MDIM) 

1*11 

T*T1 

MSUM*Oo 

THROUGH ALFA» FOR K*1»1»K.G.N 
ALFA MSUM=MSUM+M(INDU)+K) 

WHENEVER T.E.I* FUNCTION RETURN M< IND( I >+J> /MSUM 

Y=0. 

THROUGH BETA* FOR K=1»1*K.G.N 

SAVE RETURN 

SAVE DATA Y»T»MSUM»M< 1*1 ) ... M<N»N)*I»K 

EXECUTE TR.(I»K,M»TM) 

X*PHI.(K*T-1.TM> 

RESTORE DATA K»I*M<N.N) ... MC 1»1 ) »MSUM»T»Y 

RESTORE RETURN 
BETA Y*Y+(M(IND( I )+K ) /MSUM>*X 

FUNCTION RETURN Y 

END OF FUNCTION 



EXTERNAL FUNCTION <I»K»M»TM) 
ENTRY TO TR. 

RTHIS FUNCTION EFFECTS THE TRANSFORMATION FROM THE PRIOR 
RPARAMETER MATRIX M TO THE POSTERIOR PARAMETER MATRIX 
RT.(I»K»M)=TM» WHEN ONE TRANSITION FROM I To K IS OBSERVED. 
RPRIOR IS MATRIX BETA. 

PROGRAM COMMON IND» N» J.L 1ST tTOP»MDlM 
INTEGER I»K»IND*J»L»N.Jl»MDIM#TOP 
DIMENSION IND(10)»MDIM(2)»LIST(21000) 
THROUGH ALFA» FOR Jl=l *1» Jl.G.N 
THROUGH ALFA» FOR L=1*1»L.G.N 
ALFA TM(IND( JD + L)«M(IND(J1)+L) 

TM( INDm+IO^TMUNDl I)+K)+1.0 
FUNCTION RETURN 
END OF FUNCTION 



APPSHOIX 

P9DGRAH PXAPHOX ID COMPUTE 
OTAHOHS (4.2.42). 



RPROGRAM NAME IS PIAPROX. THIS PROGRAM RECURSIVELY COMPUTES 
RVALUES OF THE SUCCESSIVE APPROXIMANT PIU.Tl.M) FOR 
RI=1»...*N AND Tl*S»...»T. A MATRIX BETA PRIOR IS USED. 

PROGRAM COMMON IND»N»LIST »MDIM»N1»ADIM»AIND 

INTEGER N»IND»I*K*S*T*N1»AIND 

DIMENSION MQOO»MDIM) tIND< 10) »LIST< 21000 ) »F< 10 ) »AIND < 10) 

VECTOR VALUES MDIM=2»1*0 

VECTOR VALUES ADIM=2*1»0 
READ READ FORMAT INPUTl* N»S.T 

MDIM(2)»N 

N1=N+1 

ADIM(2)*N1 

AIND(1)«0 

IND<1)*0 

THROUGH ALFAt FOR K*1,1»K.E.N 

AIND<K+1)*IC*N1 
ALFA IND<K*1)»K*N 

READ FORMAT INPUT2* M<1»1) ... M<N»N) 

PRINT FORMAT OUTl* N»S»T»( K»l »1 »K.G.N» ( 1*1 »1* I .G.N » 
OMdND(K)-H) )) 

SET LIST TO LIST 

LIST=0 

THROUGH GAMMA* FOR K=S»1»K.G.T 

THROUGH DELTA* FOR I*1»1»I.6.N 
DELTA F<I) = PI.U»K»M> 

PRINT FORMAT 0UT2* K» F(l) ... F(N) 

SUM*0. 

THROUGH BETA* FOR 1*1*1*1. G.N 
BETA SUM*SUM+F(I) 

THROUGH EPS* FOR 1=1*1*1. G.N 
EPS F(I)«F(I)/SUM 

PRINT FORMAT 0UT3* F(l) ... F<N) 
GAMMA PRINT FORMAT 0UT4* SUM 

TRANSFER TO READ 

RFORMAT SPECIFICATIONS. 

VECTOR VALUES !NPUTl=$3I10*S 

VECTOR VALUES INPUT2*$(7F10.5 )*$ 

VECTOR VALUES 0UT1=$3H1N* » 15 »4H S*,I5»4H T=»I5/ 
0(1H »8G15.5)*$ 

VECTOR VALUES OUT2=$7H0FOR T**I2»10H PI < T»M)*» (6G15.5 ) *$ 

VECTOR VALUES 0UT3*$19H NORMALIZED VECTOR** (6G15. 5 ) *$ 

VECTOR VALUES 0UT4*$1H »S11 »7HC< T*M)**G15.6*$ 

END OF PROGRAM 



-246- 



EXTERNAL FUNCTIONC Jl »T1 tM) 
ENTRY TO PI. 

RTHIS FUNCTION RECURSIVELY COMPUTES PIUl.Tl.M)* THE TlTH 
RSUCCESSIVE APPROXIMANT TO THE J1TH ELEMENT OF THE MEAN 
RSTEAOY-STATE PROBABILITY VECTOR WHEN THE PRIOR IS MATRIX 
RBETA WITH PARAMETER M. 

PROGRAM COMMON IND»N»LIST tMDIM»Nl .ADIMtAIND 

INTEGER Jl»TltI»J»K»N»T»IND»MDIM»Nl»ADIM«AINO 

DIMENSION IND(10)»LIST(21000)»M0IM(2)»TM(100»M0IM)»PBARU0)» 
OADIM(2)»AINO(10) 

J»J1 

T*T1 

THROUGH ALFA* FOR K=1»1»K.G.N 

MSUM=0. 

THROUGH BETA* FOR 1*1*1 .I.G.N 
BETA MSUM=MSUM+MUND(K) + n 
ALFA PBAR(K)=M< INDC K>+J> /MSUM 

Y«0. 

THROUGH GAMMA» FOR K*1»1*K.G.N 

SAVE RETURN 

SAVE DATA Y.T»PBAR(K) ... PBAR<N)»K»J» M<1»1) ... M(N.N) 

M( IND(K)+J)sM(IND(K)+J)+l. 

WHENEVER T.G.lt TRANSFER TO 2ETA 

X=PIZR0.(K»M) 

TRANSFER TO ETA 
ZETA X=PI.(K.T«1»M) 
ETA RESTORE DATA M(N»N> ... M( 1*1 ) > J»K»PBAR( N> ... PBAR(K)»T,Y 

RESTORE RETURN 
GAMMA Y*Y+X*PBAR(K) 

FUNCTION RETURN Y 

END OF FUNCTION 



EXTERNAL FUNCTI0N( LI »M> 
ENTRY TO PIZRO. 

RTHIS FUNCTION COMPUTES THE TERMINAL FUNCTION PI{L1»0»MS> 
RAS THE L1TH ELEMENT OF THE STEADY-STATE PROBABILITY VECTOR 
RCORRESPONDING TO THE MEAN OF THE PRIOR DISTRIBUTION. 
RPRIOR IS MATRIX BETA WITH PARAMETER M. 



PROGRAM COMMON IND»N»LIST.MDIM»N1 • AD!M»AIND 
INTEGER Ll»L»N»ItJ»K»Nl»IND»MDIM»ADIM»AIND 

DIMENSION IND(10)»LIST(21000)»MDIM<2)»ADIM(2)»A(110»ADIM) » 



-»?- 



OAIND(IO) 
L=L1 

THROUGH ALFA* FOR 1=1*1*1. G.N 
MSUM=0. 

THROUGH BETA. FOR K*1»1*K.G.N 
BETA MSUM=MSUM+M( INO(I)+K) 

THROUGH GAMMA* FOR K*l*l» K.E.N 
GAMMA A(AIND(K)+I)«-M(IND(I)+K)/MSUM 
A(AIND<N)+I)=1. 
ALFA A(AIND<I)+N1>»0. 
A<AIND(N)+N1>»1. 
THROUGH DELTA* FOR K=1»1»K.E.N 
A(AIND<IO+K)*A<AIND<K)+K)+l. 
SCRAPsA(AIND(K)+L) 
A(AIND(K>+L>*A<AIND<K)+N) 
DELTA ACAIND(K)+N)=SCRAP 
DIAG=A(AIND(1K1) 
THROUGH EPS* FOR J«2»1»J.G.N 
EPS A<AIND<1)+J)=A(AIND<1)+J)/DIAG 
THROUGH ZETA* FOR J=2»1»J.G.N 
THROUGH ETA* FOR I=J»1*I.G.N 
SUB=A(AIND< I)+J) 
THROUGH IOTA* FOR K«1»1»K.E.J 
IOTA SUB=SUB-A<AIND< I >+K)*A< AINDC K >+J> 
ETA A(AIND< I)+J)*SUB 
DIAG=A(AIND(J)+J) 

THROUGH ZETA* FOR I=J+1 *1 • I .G.Nl 
SUBsA(AIND(J)+I) 

THROUGH LAMBDA* FOR K*1»1*K.E.J 
LAMBDA SUB»SUB-A< AIND< J)+K)*A< AIND<K)+I ) 
ZETA A(AIND<J)+I)=SUB/DIAG 

FUNCTION RETURN A( AIND< Nl+Nl ) 
END OF FUNCTION 



APPSNQBC E 
PROGRAM 7ASZHP 10 COMPOTE 



RPROGRAM NAME IS VASYMP. 

RTHIS PROGRAM RECURSIVELY COMPUTES VALUES OF V(I*J»M> FOR 
RI*1»...»N» J«S, ...tT. THE REWARD MATRIX IS R AND THE TERM- 
RINAL REWARD VECTOR IS RHO. A MATRIX BETA PRIOR IS ASSUMED. 
RTHE DISCOUNT FACTOR IS BETA. 

PROGRAM COMMON R»RHO»RDIM» IND»N>BETA»LIST»T0P 

INTEGER NtIND»J.StT,K#LtTOP 

DIMENSION R(100»RDIM) »M ( 100. RDIM) »RHO< 10) * INDQO) » VI ( 10 > » 
0V2(10).LIST<21000) 

VECTOR VALUES RDIM*2»1.0 
READ READ FORMAT INPUTl. N»S»T»BETA 

RDIM(2)«N 

IND(1)=0 

THROUGH ALFA. FOR K=1»1»K.E.N 
ALFA IND(K+1)»K*N 

READ READ FORMAT INPUT2» R(l»l) ... R(N»N)t M(l.l) ... M(N»N)» 
ORHO(l) ... RHO<N) 

PRINT FORMAT 0UT1A. BETA 

PRINT FORMAT OUTlB* < <»1 »1 »K.G.N» ( L«l tl »L«G.N» M(IND<K)^LI)? 

PRINT FORMAT OUTlO (K=1»1»K.G.N» <L=1 »1 »L.6.N»R< IMD< K)+L > ) ) 

PRINT FORMAT 0UT1D* ( K=l »1 »K.G.N» RHO<K>) 

THROUGH PHI. FOR K=1.1»K.G.N 
PHI V2(K)=0 # 

SET LIST TO LIST 

LISTaO 

THROUGH DELTA* FOR 

THROUGH GAMMA. FOR 

V1(L)=V.(L»K»M) 
GAMMA V2(L>=V1(L)~V2(L) 

PRINT FORMAT 0UT2. 

PRINT FORMAT 0UT3. 

THROUGH DELTA* FOR 
DELTA V2(L)aVl(L) 

TRANSFER TO READ 



K»S»1»K.G.T 
L*l»l. L.G.N 



K. Vl(l) ... VKN) 
V2<1> ... V2(N) 
L*l»l» L.G.N 



RFORMAT SPECIFICATIONS 
VECTOR VALUES INPUTl=$3 I 10. F10.5*$ 
VECTOR VALUES INPUT2«$< 7F10.5 >*$ 
VECTOR VALUES 0UT1A»$8H1BETA *»G15.5»$ 
VECTOR VALUES OUT1B=$8HO M »»8G15.5/<1H 
VECTOR VALUES 0UT1C«$8H0 R *»8G15.5/(1H 



»8G15.5>*$ 
*8G15.5)*$ 



-2*9- 



VECTOR VALUES OUTlD=$8HO RHO *»8G15.5/8G15.5*$ 
VECTOR VALUES OUT2=S8H0FOR T =» I2» 14H V(It T» M> =» 
07G15.5/8G15.5»$ 
VECTOR VALUES OUT3*$S3»21HDELTA V(I, T«l ♦ M) «»7Gl5.5/ 

08G15 # 5*S 
END OF PROGRAM 



EXTERNAL FUNCTION (Il» Jl» M) 
ENTRY TO V # 

RTHIS FUNCTION RECURSIVELY COMPUTES V, ( II , Jl »M)=Y. THE TOTAL 
REXPECTED DISCOUNTED RETURN IN Jl STEPS IF THE SYSTEM STARTS 
RIN STATE II WITH PARAMETER MATRIX M. PRIOR IS MATRIX BETA. 

PROGRAM COMMON R tRHO»RDIM» INDtN»BETA»LlST»TOP 
INTEGER Il»Jl»I*J»K,IND»N»RDIM»TOP 

DIMENSION R(100»RDIM)»RHO(10)#RDIM(2)»IND<10)»LIST(21000) * 
OTMUOO»RDIM> 
1*11 
J*J1 
WHENEVER J .E. 0» FUNCTION RETURN RHOU) 

MSUM=0« 

THROUGH ALFA» FOR K*l»ltK.G.N 
ALFA MSUM*MSUM+M(IND( I)+K) 
Y»0, 

THROUGH GAMMA* FOR K«l»l»tC.G.N 
SAVE RETURN 

SAVE DATA J»Y»MSUM*M< I *K ) ••• M<I»N)»I*K 
EXECUTE TR.U»K»M»TM) 
X«V.(K»J-1»TM) 

RESTORE DATA KtI»M(I»N) #•• M( I»K> »MSUM»Y» J 
RESTORE RETURN 
GAMMA Y*Y+(M(IND( I >+K ) /MSUM)* (R < IND( I )+K >+BETA*X ) 
FUNCTION RETURN Y 
END OF FUNCTION 



-250- 



EXTERNAL FUNCTION (It K» M. TM) 
ENTRY TO TR. 

RTHIS FUNCTION EFFECTS THE TRANSFORMATION FROM THE PRIOR 
RPARAMETER MATRIX M TO THE POSTERIOR PARAMETER MATRIX 
RTo(I»K»M)=TM» WHEN ONE TRANSITION FROM I TO K IS OBSERVED* 
RPRIOR IS MATRIX BETA. 

PROGRAM COMMON R »RHO»RDIM» IND»N »BETA »LlST»TOP 
DIMENSION RUOOtRDIM) »RHO( 10) «RDIM< 2 ) » IND< 10 ) »LIST ( 21000 ) 
INTEGER I»K»IND»J»L»N 
THROUGH ALFAt FOR J=ltltJ.G*N 
THROUGH ALFAt FOR L*l»ltL.G«N 
ALFA TM( IND(J)+L)»M(IND(J)+L) 

TM(IND(I)+K)*TM(IND(I)+K)+1.0 
FUNCTION RETURN 
END OF FUNCTION 



BXSUOQ&P&t 
i„ f . H, Andsraon and h. A. QoodEsaa* "Statistical iiatetsse© about Kavigov 
flhatMt" &&• &&• m« ^ Ci937)t 8M09* 

2. P. &ppa&X and J. Rss&o do P^igiot. jflMU ffl fltft fflfflflgy^ffflfeAapaft 2& 
Hgpggahn^i^ giy^. Cfo«tfai<3S*>^LllajP8» Sfcs&s (1926)* 

3* B« Ante, 6* Gallop* and FU Grahsa. 3&s l^cfalqan fltofflftftfto SfiSS^SS* 
[m*p«3 (SfeTTwsfosr* 1963). 

fc. J. A* Ay©8«s» "Resag'Ql's© programing In FORTRAN n f B J3s©» &2i» £ 
(1963)* 66?-668. 

5. G* A* ftftnKt?d» ^Saspaing inspootioa and statistical decisions** 

£• jan> ^sfc. £av sor- b # jfi U95fc). 15*<*W. 

6, FU BdUffisn* #toftjfo° SMimk 2X Q&&B88. 3 <ft\3M££&J2^1£* arinoetoa 
Uniwapsity Praas, Pfeineaton (1961). 



.» °A ppobles in the sequential design ©£ espestoaats* 5 * 



16 C1956)t 221-229. 

8* .„„. .,„ and R. Kelabs t "Sn adaptive control ppoe«ssest w 2&g 

i. vol* AG4» &»• 2 (1959)* pp» 1*9* 



9* 0. R. Beat* ,J Bayes solution of sequential decision problem tor Masfeov 
dependent observations, " &&• J^&. £ts&. 35 (196>)» 1656-1662. 



10. P. RUl&ngslay, "Statistical taethods in Wasfeov efeainst*' 
£&&• X (*961)» t&-**0? sea also confection in &&&. * P* 1^3* 



-253* 
lie D. DXaokw&tl, ^lEseret© djynaa&e prograisalng, n £gja. &§J&. .§&£&• 33 

(196?), 719-726. 

i2. k. l. ornng, Pastes JSbitoa a&tfo fitrttoMrr Xrarafttton £m*aSd3&Sla&. 

Springer, Berlin (I960)* 

MS Thesis, tfassaohusetts Institute of Technology (Kay, 19 W* 

*&• f R. Gonsalea-Zubieta, and R. L. Miller* ff a fffayfr fn, 

Da«A«l<MB Pmcoaaaa j&Jj^ Unesgteln Transition Pgahriallltlagu Technical 
Report Ho. 11, Research in the Control of Complois Systems. Operations 
Research Center, Massachusetts Xnstiv'ute of Technology (March, 1965)* 

15 • R. Dawson end X. J. Good. ^toot Horicov probabilitiea fjraia oriented 
linear graphs," Jq&« I&H*. £fe&. M (-957)t 9*$&»956. 

16* C. Dera&n, *©n sequential decisions and Matfcev ehaina, 93 J3n&. £&• 2 
(1963), 16-Sfc. 

1?. A. ordelyi, j& J&* .fflifffafiffi ^geasiaoaad^tal Baas&saa* 3 vols* 
KcGraw-milp Hew *ogfe (195>55). 

13. M* Freimsr, w & c^rneai© pzosratsalng approach to adaptive control 
processes," 2SS2BB0ft* *&!• A °-**» **>• 2 C 1959) » pp. 10*15. 

Lincoln Laboratory Report 5&3-O02O. Massachusetts Institute of Teehnol»gy 
(13 April I960). 

20* X. J. Good, w Ths frequent count of a Hartoov chain and the transition 
to continuous ttan»" ioa* Eajfe. £&&• 3E (1961), tft-48. 



3i« J. S. r-feslE&aa, **0n the ees!poun3 aaltinoe&al distribution, the 
saltiTOriat© p-distributionf sad correlations among proportions," 
i& (1962), 65-82. 



32. P. Naur (od.) t wRepert on tho algorithmic language ALOOL 60, M 
AOU 1 (i960), 299-31**. 

33* H« Eaiffa and R. Sohlaifer. ABaSftfflt ff tettt gtfftfMJL Pfflfrfftofflf J&SBE&* 
Graduate School of Business Adcalnistratlon, Harvard ttatoerai^, Boston 
(1961). 

3k. D. Rosenblatt, *$n linear models and the graphs of IHnkowskl- 
Leontief matrices," gtomrat^flfti* 25 (1957). 325-338. 

35* w. Rudin. fiflnflft fl fl, ? fi a£ ffathM'ttgfLli AQB&Q&&* Sad ©d. MeQra»»f&ll t 
HewTork (196k). 

36. L. S. Shapley, "Stochastic gases, • £ggg« !£&&• A8&&* £&« <21 (1953)* 
1095»iiOG. 

3?* R« 2. 3hor, **?&> optimal*ne regulyuvaniya Haxfcovs 9 k©i poslidovrosti 
a dvoaa fasovisai stanaal c? (On the optimal control of a Maifeor chain t&th 
two phase states; Ukranian, t&th Russian summary), j&JEnlk £Bfil&I £ 

(1961), U9-12k. 

38. £• A. Silver* Mffl»fttt>fon pge j ^qa , lprpeecaea ulth y ojBjfftftftfl ffiftPjflltto 
tefrflto O fttlLffi S£ Jteaadft * Technical Report No. 1, Research in the 
Control of Complex Systems. Operations Research Center, Massachusetts 
Institute of Technology (August, 1963). 



-§55- 
39« &• Stngep, "The steafy state probabilities of a iiatkaw oh&in as a 
function of the transition ppobefciHtiee*' 3 Qgaao 3ga» J& (i96*»)« 
**9&-*99. 

*♦©, G. 3. Uethsrlll, w 8ayealan seqaeestial Qna3jrsis 8 n ^IgssSsto* M 
(S96i) # 281-292. 

&&. P. Whittle* "Son© c&stslbution and sment fbRcolae for t£a& Kaxfeov 

chain, w 1. Bag. J$afc. 5oa* Sep. B. J2 (1955)» 235-^2» 

&2. L. £• Zaoh££s@en» "Haileo? gamo9 f c ' In Jkfaanaea £& £gga ffiflEflg* 
M. Dresher, L. 3. Shapley* and A. VI. 1taekep» ©&u Princeton Uatws&tsr 
Press* Princeton (%$&>)• pp» 811-253. 



BIOGRAPHICAL NOTS 



Masses J. Hartin, Jr* «as bom on February 3t 1936 in Patersen, New 
Jersey, «h«re ho attended elementary sohool and Eastaide F&gh School* In 
1951* after cocspleting three years of high sohool, he entered the 
University of Hlseonsin, %fhere ha ms supported by a Ford Inundation 
scholarship. In June, 1955* he raas awarded a Bachelor of Arte degree 
In Ffeysios by the University of Wisconsin and toon the Vilas Prls© Essay 
Contest. 

Mr. Martin studied Theology at the Harvard Divinity School f*m 
%955 to 1957* In July, 1957t he graduated from Officer Candidate School 
and was oosaoissioned an Ensign in the U* S* Naray Reserve* Mr* Martin 
is a career Naval officer on active doty and currently holds the rank 
of lieutenant* He has served as Executive Officer in USS BT&ITLSaofB 
(SFCBR 852) and as Engineer Officer in USS VESQLS (DDR 878). From 
£961 to 1963, Mr. Martin attended the 0. s. Saval festgradnate School, 
where he studied Operations Research, receiving the degree of Haster of 
Science in May, 1963* He has been enrolled in the graduate school of 
Kasseefgusetts Institute of Technology since February, 1963* 

Mr. Martin's publications Include WLtinorzaal Bayesian Analysis; 
Tm EteKJ3pl©3 & " published by the Sloan School of Management of M.I* T*, 
and "Distribution of the lime Uarough a Directed, Aeyello Retnesfe, w 
published in &Bassttfi09L -SflasSEBft* He presented a paper entitled ^On 
t&e Expected Gain of a Markov Chain idth Hnoerfcain Transition 
Probabilities' 3 at the 'Ifcenty-Sevanth national Meeting of the Operations 
Research Society of &aerlea. 

Mr. Martin is a reeiribar of the Ton Beta VI and Sigaa XI honor 
societies and belongs to the Operations Research Society of America* 

m 195^« Kr* Martin was narried to the former Miss Betty Bent of 
Benton, l&soonsin* They now have fear children*